Open colah opened 6 years ago
CC @ludwigschubert @znah -- Just some thoughts and not time sensitive, but I'd love any thoughts.
Sorry for delaying this discussion—I want to experiment more with an "open session/open optimization loop" style approach which @znah has been using in his notebooks a lot before forming an opinion on this proposal.
In this issue, we'll discusses some weaknesses of Lucid's present abstractions. We also present a couple ideas for possible alternatives, but don't presently have a strong view on the right path forward.
(A lot of these thoughts were developed in conversation with @ludwigschubert.)
Introduction
A lot of the weird stuff about Lucid comes from us having different needs that most TF users. A normal TF workflow looks something like "define a graph, then train it for a while". Our needs are often very different: create one graph for 30 seconds, then throw it away. Create another similar graph, then throw it away too. Moreover, these graphs often have a composable structure, where we want to be able to talk about parts of the graph independent of a particular instantiation and use it over and over again.
At a very high level, Lucid's answer is to use closures. For example, when I declare an objective:
I'm approximately creating the closure:
Where
T
is an accessor, kind of like$
in jQuery: it allows you to conveniently access lots of things you might need, without writing a lot of code.(This isn't quite true: the closure is actually further wrapped with in a convenience
Objective
object, which allows them to be added and multiplied without explicitly escaping the closure. More on this in a minute.)At a very high level, I believe that closures like this are the right abstraction for us. However, I also have a few concerns:
T
is a bit confused and may make things unnecessarily centralized.DeferredTensor
class for these kinds of closures.The role of
T
Generally, our closures take an argument
T
, a special function which allows them to access things like neural networks layers:The fact that
T
is getting passed in as an argument might make you believe it has a lot of special state. That's true to some extent -- it's kind of a grabbag of things -- but in it's main usage it equivalent to something like:Thank goodness we don't have to type that all the time, it's quite a mouthful! But it could just as easily be global.
A little bit more state comes from us wanting
T
to have special names for some nodes, to make them more user friendly. For the most part, this comes from the imported model. From that perspective, it might be more intuitive to do something likemodel["mixed4a"]
instead ofT("mixed4a")
.There are some things, like the pre-transformation input or the global step that we probably want to get from somewhere else. That said, we could probably do something like
render.global_step()
if we wanted to get rid ofT
.So, what are the pros/cons of of the present
T
closure arg set up?Pros: It seems to me that the biggest one is actually preventing users from shooting themselves in the foot by trying to access graphs that haven't been imported or constructed.
Cons: Centralized and annoying to extend; passing around unnecessary variables; alternative setups might make error messages / debugging better.
Closures / Wrapped closures
Before we talk about APIs for dealing with our closures, I'd like to clarify their role a bit:
T
issue.(Observation: These deferring closures form a monad. Most of the interesting API options are ways of reifying Functor/Monad operations in Python.)
Overview of Approaches
Broadly, there are three ways of handling the closures we create:
This is the most transparent option, but can be a bit tedious.
Objective
right now. This allows us to have the above just be:In it's most general sense, this would suggest a kind of
DeferredTensor
object. As we'll discuss shortly, this might have a number of interesting benefits.My sense is that we should either do 1 or 3, and that 2 is a kind of unhappy intermediate version. Ideally, it would be nice for all of lucid to be consistent in our choices here.
The "DeferredTensor" approach (option 3)
There are a number of interesting benefits the arise from option 3 (
DeferredTensor
):A lot of error checking could be done at creation of the
DeferredTensor
object.T
to model,model[layer]
could check if the appropriate layer exists in the model before generating aDeferredTensor
object.DeferredTensor
could carry additional meta data.DeferredTensor
could provide operator overloading / automatic coercion / etc to make the closures more convenient to manipulate. This carries the risk of becoming less transparent / confusing / abstraction creep.Tensor
s if we wanted by registering with tensorflow. But I worry this would open the door to lots of really annoying subtle bugs, where someone accidentally coerces to a TFTensor
in the wrong context and then people get surprised with graph mixing errors later on.Resulting API Possiblities
If we went with the closures are user responsibility route, but got rid of
T
:If we went the
DeferredTensor
route (and also got rid ofT
):