Open bartvm opened 9 years ago
That's really easy to do with a model, but you want it to be as optional as possible.
Mm, the thing is, that although I see the usefulness of Model
, for things like this, you shouldn't need to pass a Model
instance, because all of this works perfectly fine by just walking the computation graph (using e.g. ComputationGraph
) without making assumptions (like only having a single cost).
(I'm also not a fan of how Model
is really just a ComputationGraph
with a few assumptions built in/some extra features, but that the uses of the two are entirely different, but I haven't thought of a better alternative.)
But how do you want to get this computation graph? algorithm.cost
?
Well, that's why I said I didn't have a better solution :) because I don't want to rely on algorithm
having a particular interface either. But my thinking is generally more supportive of not passing a Model
class, but somehow passing an annotated Theano graph instead e.g. something like cg=ComputationGraph(cost)
instead of model=Model(cost)
and moving get_parameters
from Model
to something like ComputationGraph.get_parameters(brick_naming=True)
.
One of the reasons is because I've had several people be really confused about the Model
class, asking me whether they needed to subclass it, etc. When I tell them it's really just providing a way to rename parameters, and tell extensions what the cost is, and that you could just as well pass None
in many cases, they're even more confused. All in all, I think the Model
abstraction is basically just ComputationGraph
+ assumption that there are bricks and a single objective, so I wonder if it's not possible to just use ComputationGraph
, an OBJECTIVE
variable role, and a get_parameters(brick_naming=True)
method. If that is the case, it's one very confusing abstraction less, and simplifcation is always a good thing. Then again, as I said, I haven't thought this through too much (assignment due in a few hours!).
Okay, it is not nice that our understanding of what Model
is are different. Let me try to explain once more why I came to the thought that we need such a thing.
When I say "model" (lower-cased), I mean everything that the user created in order to solve his task: a complicated structure made from annotated Theano variables, application calls, application, bricks. I reserve a title-cased "Model" for the concrete component I propose to add.
Extensions need access to the model (note, lower-cased). The sanity check we are discussing here is, in fact, an extension, and this extension, for instance, needs a list of parameters at least. A better implementation will also need access to bricks, because the report that a parameter is not initialized is much more helpful when in addition a path from a top brick to this parameter is provided.
Another example: consider an extension that prints interactively in a browser window the tree of your bricks and parameters and uses a color scheme to show average magnitudes of the parameters. It needs access to bricks and parameters as well.
One can even imagine an extension whose on_error
callback identifies gradient with the respect to which of the annotated variables became NaN first. Here you need access to the computation graph with application calls and the bricks.
To sum up, Blocks with their rich introspection techniques and modularity can visualize deep learning and make it transparent. For this extensions need convenient access to everything the user defined, and the question is, who should provide it.
to be continued...
I guess both of us and our new users as well thought: "why not replace the model argument with the cg
argument and give extensions access to the computation graph"? That is
MainLoop(cg=ComputationGraph(cost), algorithm, log, ...)
. I have several reasons why I do not like it.
"Do one thing and do it well". Currently ComputationGraph
knows nothing about
bricks, and it makes it much simpler. It manages a computation graph with some abstract
annotations. I would like to keep it this way.
This is a wide-spread design, to add functionality in a few steps. Consider the way
Ubuntu manages packages: first goes dpkg
and then goes apt
. For the first
dependencies are just annotations, it can only install a downloaded packages. The
second takes care of calling dpkg
the right way, handles dependencies automatically
and works with a package repository. Similarly, the computation graph only stores
annotations, something on top of it should actually interpret them.
ComputationGraph
supports a get_bricks
method, it
must always return all bricks it extracted by the graph traversal. It would be very weird to
give the ComputationGraph
an optional argument bricks
that would override this logic.
The same thing for application calls and parameters, although for that latter we have
roles to deal with that.One could have all this functionality distributed across a set of functions like get_bricks
, get_application_calls
, etc.. Apart from the fact that I do not find it a beautiful solution, the issue (2) from the list above is more than applicable.
to be continued ...
For the reasons outlined above I think we need another entity which will answer all the questions the extensions might have about the model. I find it natural to call it Model
.
In addition the user which want to play with a saved model typically has similar needs to the extensions. So the second purpose for the Model
is to be unpickled and explored.
In the 99% frequent-case Model
is "computation graph + selector + set of keyword arguments to override whatever the first two return". Perhaps I was when I tried to factor out a generic AbstractModel
interface: it is too early for that, and it is not clear if Blocks will ever be used to host a non-Theano algorithm. But apart from that, I think Model
is a really nice thing to have.
So I would propose a few things
Model
is very different from the one from
PyLearn2, and that with 99.99% probability one does not have to inherit from itModel
mandatory to make things simple and to make nice sanity check
extensions, like the one described, in this issue possible. Then people will not have to think
about it too much, but simply write model=Model(cost)
every time. This also enables the
initialization combo for brick in model.bricks: brick.initialize()
for all the users.When I say "model" (lower-cased), I mean everything that the user created in order to solve his task: a complicated structure made from annotated Theano variables, application calls, application, bricks.
Nearly the exact same description applies to the computation graph: A collection of Theano variables, application calls, applications and bricks (all of which are annotations read by the computation graph). The only difference is that the computation graph doesn't distinguish between the kinds of annotations, while Model
does.
"Do one thing and do it well"
The original quote is: "Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new features." This is exactly what the Model
class isn't \ doing; it's adding new features to the ComputationGraph
(which it inherits from), but instead of specializing (i.e. subclassing) it mixes it in with a bunch of new features.
ComputationGraph knows nothing about bricks, and it makes it much simpler
Agree. However, this doesn't stop us from introducing an extended, brick-aware ComputationGraph
subclass instead of a new abstraction.
No chance to interfere [...]
I don't understand this paragraph. I don't understand why ComputationGraph
would need a bricks
argument, nor why Model
would. Aren't both Model
and ComputationGraph
constructed from walking a graph defined by Theano output variables?
Why not Set of Functions
Completely agree.
In the 99% frequent-case Model is "computation graph + selector + set of keyword arguments to override whatever the first two return". [...] it is not clear if Blocks will ever be used to host a non-Theano algorithm.
Blocks is a Theano-toolkit. Trying to abstract the Theano away, pretending that you could wrap anything else, is beyond the scope of the framework IMHO and is actually one of the things that made Pylearn2 so much more complex.
If we accept that the model is an annotated Theano graph (which is how you described it in the first place) shows that introducing the Model
abstraction is awkward. Most of the logic is already in ComputationGraph
. As I said, we are really just looking for a bricks-aware computation graph, which is different from introducing a new abstraction altogether.
making Model mandatory to make things simple and to make nice sanity check extensions
I have no real opposition to making a Model
-like object compulsory. But, more than ever, that means that it should introduce no restrictions e.g. anything that works for a ComputationGraph
must work for a Model
--or BrickComputationGraph
, which is how I would then like to think of it.
Let's ignore the terminology Model
for now, which I find confusing because it suggests a level of abstraction that I like to avoid, and I think it clutters the discussion. What I think we should introduce is a BrickComputationGraph
subclass which implements an awareness of Brick
, Application
and ApplicationCall
(which is effectively what Model
has done, but while pretending not to be a ComputationGraph
).
In practice, it changes very little. It has almost exactly the same functionality, but without introducing a new abstraction, and is more in spirit of the "Do one thing well"-philosophy (instead of inheriting from both ComputationGraph
and AbstractModel
, mixing two feature sets). If a user doesn't use extensions that require Brick
-awareness, they can just use a ComputationGraph
. This means that ComputationGraph
and BrickComputationGraph
should be interchangeable in many cases--which in fact, Model
and ComputationGraph
are right now as well; both return you parameters, both could return you the objective variables, etc. The incentive for using BrickComputationGraph
is simply that some extensions might need the bricks-awareness to do their job e.g. like Dump
does right now to get parameter named using the brick hierarchy.
I don't think the resulting workflow would be a bad one. You've been suggesting things like [brick.initialize() for brick in model.get_top_bricks()]
, which I think are reasonable ideas. However, I find it awkward if this kind of logic is mixed in with ComputationGraph
logic. You'd be dealing with two completely different abstractions one line after the other, while the distinction often seems arbitrary to users. Do I use model.get_parameters()
or cg.params
? Do I use VariableFilter(roles=[OBJECTIVE])(cg.shared_variabeles)
(probably needs a shortcut like cg.objectives
) or do I use model.get_objectives()
? Making one a specialization of the other removes this arbitrariness.
No chance to interfere [...]
I don't understand this paragraph.
What I mean is that I want to be able to override the default results of bricks and parameter extraction. A real example: suppose I am doing caption generation experiments with a pretrained convnet as a feature extractor. At training time I use the convnet features simply as input, and for this reason there is no reference to the convolutional bricks in the computation graph. But at test time I will need them to work with new images. It would be sad to find at this point, that these bricks were not saved. Also these bricks would not be accessible to extensions during training.
So I would like to do like this:
model = Model(cost)
model.bricks += convolutional_bricks
and have all my extensions use these new bricks I pushed into the model.
I also tried to argue that such a feature would look awkward in a hypothetic BrickComputationGraph
. In addition this name is too long and puts all emphasis on computation graph, whereas for me the forest of bricks is an equally important component.
I understand your motivation to keep things simple and to keep number of abstractions small. So here goes my proposal: let''s get rid of the AbstractModel
but keep the Model
, rebranding it like this:
class Model(object):
"""Combines a computation graph and optionally a brick hierarchy.
bla bla bla
"""
pass
If you have a name in mind that would be less dangerous than model and less verbose than BricksComputationGraph
, I am open to suggestions.
I'm not a fan of the name BrickComputationGraph
either, I just coined it to highlight how I think Model
should be interpreted: An optional specialization of the computation graph that extends functionality through an awareness of bricks. I am fine with the name Model
, but defined as a full child class of ComputationGraph
as I proposed, whose added features are optional.
The example you give seems to just be ability to provide extensions with information beyond what is normally provided by the main loop. However, I don't see why this would benefit from a Model
abstraction that allows us to override attributes to the point where all semantics are lost. The bricks
attribute in your example now represents bricks of the model, plus a random collection of bricks that only make sense to extensions that know specifically how to deal with them. The rest of the extensions will just be utterly confused, because they could (quite rightly) assume that the bricks
given to them by the "model" are part of the model being optimized.
Your convnet is pre-trained, so most likely it's already saved elsewhere. If not, why not save it separately? Or save it as part of an extension? Or just load it manually?
y = tensor.lvector('image_classes')
image_input = tensor.tensor4('images')
# Construct model
convnet = ...
y_hat = convnet.apply(image_input)
cost = CategoricalCrossEntropy(y, y_hat)
# Train convnet model here, can checkpoint/dump here
cg = ComputationGraph(cost)
features, = VariableFilter(roles=[OUTPUT], bricks=[convnet.layers[4])(cg.variables)
# Create features for caption generation here and save them to disk
# Save model for easy access if you need to, includes bricks through annotations
pickle.dump([image_input, features], open('convnet.pkl', 'wb'))
Then training your caption generation
feature_input = tensor.matrix('features')
cost = my_model.apply(feature_input)
# If you want an extension to have access to the pre-trained convnet directly
# why not just load it and pass it directly?
pretrained_convnet = pickle.load('convnet.pkl')
# Or if you want to load it from the main loop instead
pretrained_convnet = pickle.load('convnet_checkpoint.pkl').model (.brick_computation_graph)
extensions = [CustomExtension(pretrained_convnet)]
# Now CustomExtension can do whatever it wants with the pretrained convnet
# Finetuning
image_input, features = pickle.load('convnet.pkl')
# Alternatively, use the checkpointed "model"
# (which is a computation graph with extra options such as top_bricks)
pretrained_convnet = pickle.load('convnet_checkpoint.pkl').model
image_input, = pretrained_convnet.inputs
features = VariableFilter(roles=[OUTPUT], bricks=[pretrained_convnet.top_bricks[0].layers[4])(pretrained_convnet.variables)
# Load caption generation model and define new final cost to finetune
To summarize, here is my current understanding of things that you are proposing a model abstraction would solve:
Model
an extended ComputationGraph
that adds this functionality.Model
abstraction becomes convoluted if it entails "the model as defined by the annotated computation graph plus random stuff that only one or two extensions know what to do with".The example you give seems to just be ability to provide extensions with information beyond what is normally provided by the main loop.
Main loop provides no information to the extensions at all. The question of this discussion is how to give extensions such access. And my argument is that I want full control of this access. I do not want to hardcode the assumption that extensions should work only with the parameters and bricks extracted by Theano graph traversal. This comes at very low cost, but makes me calm about the future.
Like I said, I think that is a valid use case, but there are millions of things an extension could need access to. The question then is, should there be a central object that all of these extensions use? Or should the extensions that need some particular value just take this object as an argument?
My two objections to your proposal of doing it through a shared object are (a) the impossibility to provide a robust interface to an object that can provide anything to any extension; and (b) the fact that this functionality shouldn't be mashed in with an object that extends the features of the computation graph by interpreting the brick hierarchy.
I'm not convinced that all cases where extensions need a particular piece of information, they can't simply be provided with this piece of information through their arguments. This is what we do all the time after all: The algorithm receives the costs and parameter as arguments, the monitoring extensions get passed explicitly what variables and data stream it should use, and if you have an extension that uses an entire convnet, more than ever should it be explicitly passed as an argument instead of being made part of some global all-knowing object that needs to know how to cater to the needs of every extension out there.
I understand your concern that we can not build a single interface that would cater all the extensions one can imagine. When an extension needs something peculiar, the recommended way to provide this something to it is by its own argument. Or a person can have his own Blocks fork where all extension need this peculiar entity and a PeculiarModel
, subclass of Model
provides this entity to all the extensions.
But neither do I think that basic things such as the bricks or the application calls or the cost variable, if present, should every time be given explicitly. There should be reasonable defaults.
So my answer to (a) is: let's select a number of key requests an extension might make and support only those. For instance cost (optionally, only if there is a single minimized Theano cost), parameters, bricks. We can move the current naming logic from Model
to blocks/dump.py
, as there are no other consumers so far.
My answer to (b): technically you could have both BricksComputationGraph
, that interprets annotations of a ComputationGraph
as a brick hierarchy, and a Model
, which will do almost nothing, only serving as a place for substitutions, but I guess it is an overkill. All that we have in Blocks is the annotated Theano graph + bricks, which I think justifies the name Model
for the component that encapsulates both these things.
Regarding (a) I completely agree that there should be some default things that can be requested, but I am arguing that a BrickComputationGraph
-type object could provide all of these. It could use bricks to provide unique names for parameters, and returning the cost is just a matter of returning variables tagged with OBJECTIVE
. Both of these are well within the mandate of BrickComputationGraph
. Extensions that expect there to be one cost would just do objective, = brick_computation_graph.objectives
and would break down in the scenario where there are multiple costs, without hardcoding the requirement for one cost in the BrickComputationGraph
.
For (b) it is still unclear to me what a Model
should provide that a BrickComputationGraph
should not in your distinction. I don't think there is a need for the single-cost assumption (as I said above), and am wondering what else remains that a BrickComputationGraph
couldn't provide. (I am unclear about what you mean by "place for substitutions".)
(a) I agree that it could provide these things. The only thing I miss is the ability to tell it to ignore its findings and use the values I provide instead of them. This is exactly what I call the place for substitutions, which also answers your (b).
Another example why such substitutions might be necessary: there is no single "true" way of selecting parameters. cg.parameters
and ch.shared_variables
seem to cover all cases, but which one should we use as the default? And there is more than one extension that would like to know what parameters are: Dump
and DefaultMonitoring
at least. I think they should request them from model. That's why we need a keyword parameters
argument for the model that tells it what are the actual parameters.
Such an argument is slightly out of the scope of BrickComputationGraph
semantics.
Okay, that's a use case I understand. It still seems like overkill though to start an entirely new class that lies outside of the ComputationGraph
hierarchy. Especially because I would like ComputationGraph
and Model
to be interchangeable in many cases, and don't want to duplicate features. To me, it seems that what we really want is a mutable BrickComputationGraph
i.e. one where the attributes can't just be read, but also set. That is, a hierarchy which would go like
ComputationGraph > BrickComputationGraph > MutableBrickComputationGraph (= Model)
So there is ComputationGraph.get_parameters()
, there is e.g. ComputationGraph.get_parameters(unique_names=False)
, and there could be Model.get_parameters(unique_names)
which also comes with Model.set_parameters()
.
As an extension that doesn't care, I will just call model.get_parameters()
, and even if the user passed a ComputationGraph
instance, everything works just fine. It's only when extensions start using the advanced arguments and methods provided by the subclasses that an error would occur e.g. Dump
would crash when passed a ComputationGraph
, but would be happy with a BrickComputationGraph
. If the user wants to override the parameters, objectives, etc. he/she would need to use Model
.
Extensions could easily check whether the model given to them is a BrickComputationGraph
(and not ComputationGraph
), and raise an error saying something like "This extension relies on the advanced features provided by brick annotations. It cannot be used on raw Theano graphs." Meanwhile, most extensions will happily work on a ComputationGraph
, as long as they can call get_parameters
.
The major advantage in my eyes is that we (a) share a maximum amount of logic between Model
and ComputationGraph
, because the former is really just a mutable, slightly cleverer version of the latter; and (b) we get rid of another abstraction.
It still seems like overkill though to start an entirely new class that lies outside of the ComputationGraph hierarchy
That makes certain sense, and Model
is already a subclass of ComputationGraph
. But below I have an argument against it.
there is e.g. ComputationGraph.get_parameters(unique_names=False)
I believe you meant BrickComputationGraph
?
As an extension that doesn't care, I will just call model.get_parameters()
Do you want to add the get_parameters
method to the ComputationGraph
? That would conflict with the current parameters
property, let alone the fact that its interface is almost property only.
(b) we get rid of another abstraction.
I do not see where we get rid of an abstraction, because MutableComputationGraph
aka Model
is already rather abstract.
Your plan is quite good, but there a few points which I would like to criticize:
ComputationGraph
to the main loop that are you ready to push a higher-level method get_parameters
down the hierarchy. And then you will have to explain user "in these cases use the CG, in this case the Model
. My counter-proposal is to fix Model
interface without glancing back at lower-level bricks and without making the lower-level bricks to support Model
's higher order methods. And to make Model
mandatory, which will free us from writing lengthy explanations on when use what. Anyway Model
works just fine when there are no bricks in the computation graph at all.Model
to be a descendant of CG
, we might face questions "should I use model.parameters
or model.get_parameters()
", and those are also very bad questions to answer.
In fact the is relationship between Model
and CG
does not fully hold as soon set_parameters
enters the game.I would rather go for the following scheme: BrickComputationGraph
and a light-weight Model
which aggregates a BrickComputationGraph
object:
class Model(object):
"""Interface for main loop extension to get cost, parameters, bricks.
Attributes
------------
cg : BrickComputationGraph
"""
def __init__(self, outputs):
self.cg = BrickComputaionGraph(outputs)
def get_parameters(self):
if not hasattr(self, '_params'):
return cg.parameters
return self._params
def set_parameters(self, params):
self._params = params
...
The extensions would be expected to ask model
when possible and model.cg
for advanced things that Model
does not cater.
Do you want to add the get_parameters method to the ComputationGraph? That would conflict with the current parameters property, let alone the fact that its interface is almost property only.
Not add, it would simply replace the parameters
property. It's only 4 extra characters.
I do not see where we get rid of an abstraction, because MutableComputationGraph aka Model is already rather abstract.
We got rid of an abstraction in the sense that Model
will be, both in code and in principle, a mutable ComputationGraph
that is brick-aware. This means that in 90% of the cases Model
and ComputationGraph
are completely interchangeable. This is different from having an entirely separate Model
class which is used differently (e.g. model.cg.parameters
vs. model.get_parameters()
).
if we force Model to be a descendant of CG, we might face questions "should I use model.parameters or model.get_parameters()".
As mentioned, I would just replace model.parameters
with get_parameters
. I prefer having Model
as a full subclass of ComputationGraph
instead of creating a new class, at the cost of turning a property into a method this is worthwhile for me.
you wish so desperately to pass pure ComputationGraph to the main loop that are you ready to push a higher-level method get_parameters down the hierarchy.
get_parameters
isn't much higher-level than parameters
for a ComputationGraph
object, the only difference is the @property
decorator and 4 extra characters!
In fact the is relationship between Model and CG does not fully hold as soon set_parameters enters the game.
I don't fully understand this. Model
is simply a subclass of ComputationGraph
that implements a set_parameters
class. They still share an interface, and hence are interchangeable in 90% of the cases, which is the important part. This means that a large part of framework still works without needing a Model
class and hence without needing BrickComputationGraph
. If you want to make Model
compulsory, this is a massive plus. It makes the code a lot more portable simply by not indirectly forcing people to have their computation graph interpreted as if it had bricks, even when it doesn't.
So, my proposal would be more along the lines of:
class ComputationGraph(object):
def get_parameters(self):
return VariableFilter(roles=[PARAMETER])(self.shared_variables)
class BrickComputationGraph(ComputationGraph):
def get_parameters(self, as_dict=False):
parameters = super(BrickComputationGraph, self).get_parameters()
if as_dict:
return OrderedDict(...)
return parameters
class Model(BrickComputationGraph):
def get_parameters(self, **kwargs):
if hasattr(self, '_parameters'):
return self._parameters
return super(Model, self).get_parameters(**kwargs)
def set_parameters(self, parameters):
self._parameters = parameters
This means that, if we want to, we can remove the need for users to interact with ComputationGraph
entirely. They could just use Model
in all those situations, no need to juggle ComputationGraph
and Model
throughout. There is also only one interface. No need to wonder when to user model.cg.parameters
and when to use model.get_parameters()
.
From a code perspective, a user that wants to use Blocks' main loop but doesn't want to use bricks, can simply use ComputationGraph
and will basically have no imports or any links to blocks.bricks
to deal with. I know that your proposed Model
would work on CGs without bricks, but code-wise the dependency is definitely there!
Okay, @bartvm , I understand what you propose, and I more or less like it, but for one main point - I do not agree that relationship between Model
and BrickComputationGraph
can be described by the word "is", that is that the former can be a descendant of the latter. You propose to achieve it by forcing the ComputationGraph
to support the Model
interface (whose existence you do not acknowledge, but which is get_objective, get_parameters, get_bricks
), forcing it to have a method get_parameters
which will look awkward when the rest is properties.
For a user that wants to use Blocks but doesn't want to use bricks the Model
can as well ComputationGraph
as its attribute. The get_bricks
method can looks as follows:
class Model(object):
def get_bricks(self):
if hasattr(self, '_bricks'):
return self._bricks
if isinstance(self.cg, BrickComputationGraph):
return self.cg.top_bricks
return [] # or raise ValueError
There is something more to say here: Model
we discuss now is primitive, because there is very little in common between all various models that Blocks may host. However, when I am doing a concrete experiment, I would like to have more functionality in my Model
subclass. For instance now I am working on speech recognition and I know what my top bricks are and how to use them. It is convenient for me to have a class PhonemeRecognizer
that also knows what top variables are and can create computation graphs with them. This way this logic is not scattered across 10 scripts that I use for model analysis.
class PhonemeRecognizer(object):
def __init__(self, top_bricks):
# create all input variables
def build_cost_graph(self):
...
def build_generate_graph(self):
...
I would like to inherit this class from Model
, which makes sense I think. This way all extensions can use the advanced methods my extended model provides, for instance they can ask it to sample phoneme transcriptions for a few given utterances from the validation set. But if we go your way, I can't do that because I will need to specify cost the creation time, which is not always possible:
recognizer = PhonemeRecognizer(top_bricks)
basic_cost = recognizer.build_cost_graph()
regularized_cost = apply_dropout(basic_cost, ...)
recognizer.set_objective(regularized_cost) # the point at which computation graph is known
I am not saying that things like this should be a part of Blocks, a core library. But when you make Model
be a ComputationGraph
(how it is now) inheriting from this class becomes problematic.
Just an update: I found it quite convenient to put the functionality that I wanted to put in the Model
descendants,
in the top brick. This can have input variables, compile beam search, and do whatever one wants.
This means that I care much less now if you @bartvm don't want to have a new Model
hierarchy. My extensions will simply do top_brick, = self.main_loop.get_top_bricks()
.
Following the discussion at #337, it would be a good idea to add another check to see whether weights were initialized (besides #244). Namely, in the main loop we can simply walk the graph, retrieve the bricks, and check whether all of them were initialized. If that's not the case, we can raise a warning (just as is currently done when e.g. parameters don't match between the model and the algorithm).