tensorflow / lucid

A collection of infrastructure and tools for research in neural network interpretability.
Apache License 2.0
4.65k stars 655 forks source link

Rethinking Abstractions: T, endpoints, "deferred tensors", ... #25

Open colah opened 6 years ago

colah commented 6 years ago

In this issue, we'll discusses some weaknesses of Lucid's present abstractions. We also present a couple ideas for possible alternatives, but don't presently have a strong view on the right path forward.

(A lot of these thoughts were developed in conversation with @ludwigschubert.)


Introduction

A lot of the weird stuff about Lucid comes from us having different needs that most TF users. A normal TF workflow looks something like "define a graph, then train it for a while". Our needs are often very different: create one graph for 30 seconds, then throw it away. Create another similar graph, then throw it away too. Moreover, these graphs often have a composable structure, where we want to be able to talk about parts of the graph independent of a particular instantiation and use it over and over again.

At a very high level, Lucid's answer is to use closures. For example, when I declare an objective:

obj = objectives.channel("mixed4a", 37)

I'm approximately creating the closure:

obj = lambda T: tf.reduce_sum(T("mixed4a")[..., 37])

Where T is an accessor, kind of like $ in jQuery: it allows you to conveniently access lots of things you might need, without writing a lot of code.

(This isn't quite true: the closure is actually further wrapped with in a convenience Objective object, which allows them to be added and multiplied without explicitly escaping the closure. More on this in a minute.)

At a very high level, I believe that closures like this are the right abstraction for us. However, I also have a few concerns:


The role of T

Generally, our closures take an argument T, a special function which allows them to access things like neural networks layers:

obj = lambda T:   ...   T("mixed4a")  ...

The fact that T is getting passed in as an argument might make you believe it has a lot of special state. That's true to some extent -- it's kind of a grabbag of things -- but in it's main usage it equivalent to something like:

def T(name):
  return tf.get_default_graph().get_tensor_by_name("import/" + name)

Thank goodness we don't have to type that all the time, it's quite a mouthful! But it could just as easily be global.

A little bit more state comes from us wanting T to have special names for some nodes, to make them more user friendly. For the most part, this comes from the imported model. From that perspective, it might be more intuitive to do something like model["mixed4a"] instead of T("mixed4a").

There are some things, like the pre-transformation input or the global step that we probably want to get from somewhere else. That said, we could probably do something like render.global_step() if we wanted to get rid of T.

So, what are the pros/cons of of the present T closure arg set up?

Pros: It seems to me that the biggest one is actually preventing users from shooting themselves in the foot by trying to access graphs that haven't been imported or constructed.

Cons: Centralized and annoying to extend; passing around unnecessary variables; alternative setups might make error messages / debugging better.


Closures / Wrapped closures

Before we talk about APIs for dealing with our closures, I'd like to clarify their role a bit:

(Observation: These deferring closures form a monad. Most of the interesting API options are ways of reifying Functor/Monad operations in Python.)

Overview of Approaches

Broadly, there are three ways of handling the closures we create:

  1. Our API creates actual tensor objects, and we rely on our users wrap API calls in a closure. We presently do this for parameterizations, such as:
param_f = lambda: param.image(...)

This is the most transparent option, but can be a bit tedious.

  1. Our API returns closures. This makes some simple use cases convenient, but can be a bit annoying. For example, when we did objectives this way, we needed to stuff like this:
obj = lambda: channel(..., 3)() + channel(..., 4)()
  1. Our API returns closures wrapped into a special type of object, like Objective right now. This allows us to have the above just be:
obj = channel(..., 3) + channel(..., 4)

In it's most general sense, this would suggest a kind of DeferredTensor object. As we'll discuss shortly, this might have a number of interesting benefits.

My sense is that we should either do 1 or 3, and that 2 is a kind of unhappy intermediate version. Ideally, it would be nice for all of lucid to be consistent in our choices here.

The "DeferredTensor" approach (option 3)

There are a number of interesting benefits the arise from option 3 (DeferredTensor):


Resulting API Possiblities

If we went with the closures are user responsibility route, but got rid of T:

# Get layers by indexing model instead of accessing T:
obj = lambda: L2(model["mixed4a", ..., 37])

If we went the DeferredTensor route (and also got rid of T):

# Deferred tensor convenience functions
obj = model["mixed4a", ..., 37].L2

# No model arg to render_vis -- it can be inferred
render_vis(obj)

# Evaluate layers in a standalone way:
model["mixed4a"].isolated_eval(...)
colah commented 6 years ago

CC @ludwigschubert @znah -- Just some thoughts and not time sensitive, but I'd love any thoughts.

ludwigschubert commented 6 years ago

Sorry for delaying this discussion—I want to experiment more with an "open session/open optimization loop" style approach which @znah has been using in his notebooks a lot before forming an opinion on this proposal.