colah commented 6 years ago

In this issue, we'll discusses some weaknesses of Lucid's present abstractions. We also present a couple ideas for possible alternatives, but don't presently have a strong view on the right path forward.

(A lot of these thoughts were developed in conversation with @ludwigschubert.)

Introduction

A lot of the weird stuff about Lucid comes from us having different needs that most TF users. A normal TF workflow looks something like "define a graph, then train it for a while". Our needs are often very different: create one graph for 30 seconds, then throw it away. Create another similar graph, then throw it away too. Moreover, these graphs often have a composable structure, where we want to be able to talk about parts of the graph independent of a particular instantiation and use it over and over again.

At a very high level, Lucid's answer is to use closures. For example, when I declare an objective:

obj = objectives.channel("mixed4a", 37)

I'm approximately creating the closure:

obj = lambda T: tf.reduce_sum(T("mixed4a")[..., 37])

Where T is an accessor, kind of like $ in jQuery: it allows you to conveniently access lots of things you might need, without writing a lot of code.

(This isn't quite true: the closure is actually further wrapped with in a convenience Objective object, which allows them to be added and multiplied without explicitly escaping the closure. More on this in a minute.)

At a very high level, I believe that closures like this are the right abstraction for us. However, I also have a few concerns:

The role of T is a bit confused and may make things unnecessarily centralized.
If we want to wrap things up as objects, it may make sense to have a more general DeferredTensor class for these kinds of closures.
Our API is a bit inconvenient for people not buying into the full optvis framework.
- For example, how do I just get the activations of a layer for a given input?

The role of `T`

Generally, our closures take an argument T, a special function which allows them to access things like neural networks layers:

obj = lambda T:   ...   T("mixed4a")  ...

The fact that T is getting passed in as an argument might make you believe it has a lot of special state. That's true to some extent -- it's kind of a grabbag of things -- but in it's main usage it equivalent to something like:

def T(name):
  return tf.get_default_graph().get_tensor_by_name("import/" + name)

Thank goodness we don't have to type that all the time, it's quite a mouthful! But it could just as easily be global.

A little bit more state comes from us wanting T to have special names for some nodes, to make them more user friendly. For the most part, this comes from the imported model. From that perspective, it might be more intuitive to do something like model["mixed4a"] instead of T("mixed4a").

There are some things, like the pre-transformation input or the global step that we probably want to get from somewhere else. That said, we could probably do something like render.global_step() if we wanted to get rid of T.

So, what are the pros/cons of of the present T closure arg set up?

Pros: It seems to me that the biggest one is actually preventing users from shooting themselves in the foot by trying to access graphs that haven't been imported or constructed.

Cons: Centralized and annoying to extend; passing around unnecessary variables; alternative setups might make error messages / debugging better.

Closures / Wrapped closures

Before we talk about APIs for dealing with our closures, I'd like to clarify their role a bit:

We create closures to defer TF operations so they can be run in association with a graph that doesn't exist yet (or possibly re run with multiple graphs).
This is mostly a separate issue from the T issue.

(Observation: These deferring closures form a monad. Most of the interesting API options are ways of reifying Functor/Monad operations in Python.)

Overview of Approaches

Broadly, there are three ways of handling the closures we create:

Our API creates actual tensor objects, and we rely on our users wrap API calls in a closure. We presently do this for parameterizations, such as:

param_f = lambda: param.image(...)

This is the most transparent option, but can be a bit tedious.

Our API returns closures. This makes some simple use cases convenient, but can be a bit annoying. For example, when we did objectives this way, we needed to stuff like this:

obj = lambda: channel(..., 3)() + channel(..., 4)()

Our API returns closures wrapped into a special type of object, like Objective right now. This allows us to have the above just be:

obj = channel(..., 3) + channel(..., 4)

In it's most general sense, this would suggest a kind of DeferredTensor object. As we'll discuss shortly, this might have a number of interesting benefits.

My sense is that we should either do 1 or 3, and that 2 is a kind of unhappy intermediate version. Ideally, it would be nice for all of lucid to be consistent in our choices here.

The "DeferredTensor" approach (option 3)

There are a number of interesting benefits the arise from option 3 (DeferredTensor):

A lot of error checking could be done at creation of the DeferredTensor object.
- For example, if we switched from T to model, model[layer] could check if the appropriate layer exists in the model before generating a DeferredTensor object.
DeferredTensor could carry additional meta data.
- It could track which models need to be imported, so that they could be automatically imported.
- It could track the maximum batch an objective refers to, so that parameterizations could automatically scale.
DeferredTensor could provide operator overloading / automatic coercion / etc to make the closures more convenient to manipulate. This carries the risk of becoming less transparent / confusing / abstraction creep.
- We could get automatic coercion to Tensors if we wanted by registering with tensorflow. But I worry this would open the door to lots of really annoying subtle bugs, where someone accidentally coerces to a TF Tensor in the wrong context and then people get surprised with graph mixing errors later on.

Resulting API Possiblities

If we went with the closures are user responsibility route, but got rid of T:

# Get layers by indexing model instead of accessing T:
obj = lambda: L2(model["mixed4a", ..., 37])

If we went the DeferredTensor route (and also got rid of T):

# Deferred tensor convenience functions
obj = model["mixed4a", ..., 37].L2

# No model arg to render_vis -- it can be inferred
render_vis(obj)

# Evaluate layers in a standalone way:
model["mixed4a"].isolated_eval(...)

colah commented 6 years ago

CC @ludwigschubert @znah -- Just some thoughts and not time sensitive, but I'd love any thoughts.

ludwigschubert commented 6 years ago

Sorry for delaying this discussion—I want to experiment more with an "open session/open optimization loop" style approach which @znah has been using in his notebooks a lot before forming an opinion on this proposal.

tensorflow / lucid

Rethinking Abstractions: T, endpoints, "deferred tensors", ... #25

Introduction

The role of `T`

Closures / Wrapped closures

Overview of Approaches

The "DeferredTensor" approach (option 3)

Resulting API Possiblities

tensorflow / lucid

Rethinking Abstractions: T, endpoints, "deferred tensors", ... #25

Introduction

The role of T

Closures / Wrapped closures

Overview of Approaches

The "DeferredTensor" approach (option 3)

Resulting API Possiblities

The role of `T`