Closed fchollet closed 7 years ago
@fchollet How would You like to deal with the add_input
method?
All unused names in the namespace
are considered inputs?
@elanmart that would be way too implicit. No, it's actually really simple: after we introduce an Input
layer (which is necessary for automatic tensor shape inference), you don't need to mark certain layers as inputs via something like add_input
.
Ability to re-initialize layer weights without re-compiling a model.
Don't forget this also means re-initialize the optimizer (for adam, rmsprop and other moving averages).
Abstracting all references to Theano functionality to a Keras backend.
How would you train models with different backends though? You pick and choose one at a time right?
I would also suggest a better way of dealing with desired output shapes. Right now, Keras implicitly assumes the desired is either a matrix (samples x dim) or a tensor3 (samples x time x dim). I had an image as desired and it took me days (partially my bad though) to realize what was wrong, my (samples x rows x cols) was being reshaped as (samples-rows x cols) without warning. This is really dangerous since the method just compiled and ran like nothing was wrong.
@EderSantana right, I also think a simplified and more intuitive way of applying loss functions, masks and weights would be welcome.
Abstracting all references to Theano functionality to a Keras backend. This will allow us to implement non-Theano backends in the near future
It would be most important and valuable features in the future.
Also I think we should be able to implement any neural network easily using Keras backend API only without complicated theano codes (for instance, implementing GRU with keras api only like #620 )
Also I think we should be able to implement any neural network easily using Keras backend API only without complicated theano codes (for instance, implementing GRU with keras api only like #620 )
Yes, the recurrent Container would be great to have. It could really speed up experimenting with different RNN architectures.
First, I'll apologize for not completing the caffe PR yet. I'll try to do it over the weekend.
Maybe some of the things done in Caffe PR could be used in shape inference and inferring input and output nodes?
The very first and bad implementation of non sequential models that I wrote did something like that - fallback to recently added layer if previous node is not explicitly mentioned.
Automatic tensor shape inference
I have started doing this in my fork because current way of using convolutional layers is too difficult. I can do the same with all layers and make a PR.
@matsuyamax in your solution can the user still define the shapes himself and continue compatible with the current API?
@EderSantana I don't think compatibility can be achieved if we make input dimensions optional, because they are not keyword arguments. Constructor of convolutional layers looks like this:
def __init__(self, input_dim, nb_filter, filter_length, init='uniform', activation='linear', ...)
Currently we use it like this:
layer = Convolution1D(input_dim, nb_filter, filter_length, init='uniform', activation='linear', ...)
We want to make input_dim
optional, to use it like this:
layer = Convolution1D(nb_filter, filter_length, init='uniform', activation='linear', ...)
That's not possible with compatibility.
@matsuyamax that wouldn't be the most smooth solution I think. Sometimes we just need a quick layer and skip the lazy-initialization that inferred shapes would impose. What if input_dim is optional and the inferred value has the last word? Like, for example, lets assume the Dense layer
class Dense(Layer)
def __init__(output_dim, input_dim=None, *args, **kwargs):
self.input_dim = input_dim
self.output_dim = output_dim
self. ...
...
self.initialize()
def initialize(self):
if self.input_dim is None:
raise ValueError("Using lazy initialization, either define `input_dim` or set a `previous_layer`)
else:
self.W = self.init((self.input_dim, self.output_dim))
self.b = shared_zeros((self.output_dim))
def set_previous(self, layer)
self.previous = layer
if self.input_dim is None:
self.input_dim = layer.output_dim
elif self.input_dim != layet.output_dim:
warn("Overwriting `input_dim` of layer {}. The user defined parameter differs from what we inferred".format(self.name))
This way, I would not be forced to define two layers (input and output) in the cases where I need only one of them. What do you think?
This way, I would not be forced to define two layers (input and output) in the cases where I need only one of them. What do you think?
I agree, but your solution breaks compatibility too. You see what I mean?
We could do your solution, but then we would have to implement input shape management in every layer __init__
. maybe it would be simpler to do like:
layer = Dense(output_dim)
layer.set_input_shape(input_shape)
or
layer = Dense(output_dim)
layer.input_shape = input_shape
Then we would have a single method common with all layer.
The major problem with shape inference is that the layer objects are created first and then added to the model. So, at initialization, the object has no idea whether it will receive an input dimension or should it raise an error.
The way I did this was to basically split up the current initialization method into two parts and calling the second part (the one where parameters are initialized) after the layer has been added to the model.
Backwards compatibility is possible, the code just gets ugly. input_dim
will be the first argument with defaults. We would then have to swap the arguments cleverly. If input_dim
is None
, shape has to be inferred. If it's not None
, it has been provided and the user view would have it as first argument. So you'd have to essentially rotate the first few arguments and input_dim
right.
I also think a simplified and more intuitive way of applying loss functions, masks and weights would be welcome.
Loss functions could be reduced to layers.
TL;DR Maybe we should expose less OOP and makes things even easier to use.
Loss functions could be reduced to layers.
The more layers we introduce, the longer it will take to write a model and start training it. One of the things I like the most about Keras is that you can actually memorize all the steps to write a model, compile and run the experiment. If we introduce too much boilerplate, like a layer for input, a layer for the cost, lazy only initialization, etc. we are going to be just another Blocks and we will always have to go back to copy paste from the documentation to do anything. I'm not saying Blocks is bad, I used to contribute to it as well. I'm just saying that we don't need another Blocks.
What do you think? I honestly believe that this simple API is a huge PLUS about Keras. That is why it is more popular than Blocks or Lasagne which were started almost at the same time. I'm not saying we should just appeal to the masses, I'm saying that simple is the ultimate sophistication and I believe Keras did it and we shouldn't go back. Maybe the solution is in the direction of less OOP not even more OOP. I also think that progress means making things even easier to use, like an optional lazy initialization or easier to develop RNNs.
For the case of losses, the reshaping and masking should go inside the objective. This would give the user more control when necessary, without making to hidden. For example, the mse cost would just be
def mse(y_true, y_pred):
return T.sqr(y_true.flatten(ndim=2) - y_pred.flatten(ndim=2)).mean(axis=-1)
In other words the reshaping goes inside the objective, and we never collapse data dimensions with the first samples
dimensions. For masking we could have an extra optional parameter
def mse(y_true, y_pred, mask=None):
if mask is None:
return T.sqr(y_true.flatten(ndim=2) - y_pred.flatten(ndim=2)).mean(axis=-1)
else
cost = T.sqr(y_true.flatten(ndim=2) - y_pred.flatten(ndim=2)) * mask
return cost.sum(axis=-1) / mask.sum(axis=-1)
where mask
would have the same shape as the desired. Sorry for the long comment.
I think loss
layers are a great idea that would add flexibility add make it easier to implement things like hierarchal softmax or NCE cost. I believe they would also simplify masking.
One of the things I like the most about Keras is that you can actually memorize all the steps to write a model, compile and run the experiment.
I agree. Being able to build networks by heart after using Keras a few times is proof that it does a good job at reducing cognitive load, and we want to keep that. Deep learning should be like playing with Duplo blocks and coloring with crayons.
We can back off from the idea of having Input
layers, I agree they seemed quite a bit boilerplatey. But then what would we use?
I guess we could have an input argument input_shape
and it could be handled by the super __init__
of the base layer class so we don't have to reimplement logic in every layer like @matsuyamax points out. I would be fine with that. Other ideas?
I think loss layers are a great idea that would add flexibility add make it easier to implement things like hierarchal softmax or NCE cost. I believe they would also simplify masking.
It don't think it's necessary to have object losses, because losses don't have a state, unlike optimizers and layers. Everything you could do with a loss layer, you should be able to do it with a loss function.
Also, if it's a loss layer then it's not a layer. Layers only access the previous layer's input.
For example, the mse cost would just be
Sure, but what of masking and weighting?
Maybe the solution is in the direction of less OOP not even more OOP.
OOP is a big part of what made Keras successful and it is the most appropriate paradigm for deep learning, where modules are stateful (e.g. layers).
@fchollet
But then what would we use?
We just have to enforce an input_dim
in the first layer. Then everything else follows smoothly. Even when the other input_dim
are being inferred. See my discussion with @matsuyamax above.
Sure, but what of masking and weighting?
I showed an example of how to do masking above. This does not need to be repeated on every layer necessarily. We could write a generic decorator that does the masking. If I didn't get my syntax wrong, it should be something like:
def objective_decorator(func):
@wraps(func)
def wrapper(y_pred, y_true, mask, sample_weight):
y_pred = y_pred.flatten(ndim=2)
y_true = y_true.flatten(ndim=2)
cost = func(y_pred, y_true)
cost = cost[sample_weight]
if mask is None:
return cost.mean(axis=-1)
else:
mask = mask[sample_weight]
return cost.sum(axis=-) / (mask.sum(axis=-1) + 1e-7)
return wrapper
With this decorator, we wouldn't even have to rewrite a lot in the objectives we already have. Usage would just be
@objective_decorator
def mse(y_pred, y_true)
return T.sqr(y_pred - y_true)
OOP is a big part of what made Keras successful and it is the most appropriate paradigm for deep learning, where modules are stateful (e.g. layers).
I agree, we just don't have to force the user to always define classes to be able to use Keras. Sequential
and Graph
allow us to do that. We can create new cost functions and new inputs without having to define whole new class for it.
decorators are not super intuitive tho... Thoughts?
Adding output shape inference to all layers has given me to occasion to dive deeper into the repo. Here are some thoughts about the code in general. I think the code has potential, but several issues need addressing.
set_name
method, not just Dense
.core.RepeatVector
should use broadcasting, currently it is copying input data and that is inefficient.convolutional.UpSample2D
.Reshape
should take a tuple as argument.layers.core
is too long. Should be broken down in a few files.keras.layers
.ZeroPadding2D
layer but not ZeroPadding1D
?MaxPooling1D/2D
: poolsize
vs. pool_length
. Should probably be pool_size
.Convolution1D
and Convolution2D
should use a stride
argument instead of subsample
(which is not standard).Convolution1D
and Convolution2D
: Convolution2D
uses a hack for cuDNN bot not Convolution1D
.We could write a generic decorator that does the masking. If I didn't get my syntax wrong, it should be something like
It would be a rather elegant solution. I'm all for it.
Small inconsistency in MaxPooling1D/2D: poolsize vs. pool_length. Should probably be pool_size.
I would be fine with such a change.
Reshape should take a tuple as argument.
Again, this is fine.
Why is there a ZeroPadding2D layer but not ZeroPadding1D
Feel free to add it.
A few comments:
Why is there a ZeroPadding2D layer but not ZeroPadding1D
Feel free to add it.
What about adding support for Convolution3D and ZeroPadding3D as well? I think I have this working in my fork.
In the same fork, I have added support to freeze layers. This now works for Graph as well and the trainable
state can be serialized. Maybe this is something to include in the main Keras?
I would also suggest a better way of dealing with desired output shapes. Right now, Keras implicitly assumes the desired is either a matrix (samples x dim) or a tensor3 (samples x time x dim). I had an image as desired and it took me days (partially my bad though) to realize what was wrong, my (samples x rows x cols) was being reshaped as (samples-rows x cols) without warning. This is really dangerous since the method just compiled and ran like nothing was wrong.
I lost quite some time time on that one too. Reshaping should definitely be inside the objective function.
Convolution1D and Convolution2D should use a stride argument instead of subsample (which is not standard).
Another nice addition would be an upsample
mode (or "deconvolution"). This is particularly useful for image segmentation (see here and here).
Another nice addition would be an upsample mode (or "deconvolution").
We have UpSampling1D and 2D layers, I think this is what you are looking for.
In the same fork, I have added support to freeze layers. This now works for Graph as well and the trainable state can be serialized. Maybe this is something to include in the main Keras?
Possibly. You can submit a PR and I will review it.
What about adding support for Convolution3D and ZeroPadding3D as well? I think I have this working in my fork.
Certainly. We already have a PR with Convolution3D undergoing review, you might want to review it to make sure you're on the same page.
Ability to re-initialize layer weights without re-compiling a model.
I would love to see that. I would like to use Keras in for "online" predictions while training is done "offline". Right now I use the save / load model procedure as described in the FAQ but it takes rather long (approx 1 min) to load everything until a prediction on new data can be made. Maybe there is already a better way to do this?
@holderm : you can already do so. Let's say you have a compiled model in production and you would like to update it (to sync it with an identical model that you've just finished training somewhere else). You can just dump the weights of your newly trained model to HDF5, and load them in your production model without recompiling, just via save_weights
and load_weights
. This takes a few hundred milliseconds for a large model.
Yes, that's true but assume training and prediction take place in different sessions. First one needs to compile the (previously trained) model and then predictions can be made. Once the model is compiled on the new seassion update of weights (load_weights
) works fine, but it would be nice if there is a way to skip the compiling (similar to the discussion here: https://groups.google.com/forum/#!searchin/keras-users/load/keras-users/lKjA7qF3ctY/Mzc16Le4CwAJ )
Another nice addition would be an upsample mode (or "deconvolution").
We have UpSampling1D and 2D layers, I think this is what you are looking for.
I was more thinking of a straight "reverse" of the convolution in a single layer like the one in Caffe or MatConvNet. This is more efficient than upsampling + convolution.
Yes, that's true but assume training and prediction take place in different sessions.
As long as you have a compiled model with the same architecture living in each session, you will be fine. But if you don't, the model re-initialization feature we will add is not going to help you anyway.
Here is what I think we should go for the Graph
API:
inputs
and input
arguments, instead, a single inputs
argument that could be a string, a dictionary (multi-io layers), or a list of strings..add(name, layer, inputs, **kwargs)
method. When no inputs
argument is provided, the layer is assumed to be an input layer. If is_output
is set to True
, an output node is created.merge_mode
keyword argument way of controlling input merging when the input is a list of layer names.Also untrainable outputs would be nice. Such as returning the output of intermediate layers in the predict() function but without having to actually define a 0 loss function for them and slow down the training procedure.
Here is what I think we should go for the Graph API:
I am not so sure anymore about this. This will require more design work. We'll post-pone it for now.
If i recall, There was some issues about polymorphism right? With having input
being a variety of data types
If i recall, There was some issues about polymorphism right? With having input being a variety of data types
I think having input
support a variety of input modes (name, list of names, dict {output_name: input_name}) would be a much better UX than having separate keyword args for all these.
The thing I am less sure of is the removal of add_input
and add_output
.
@fchollet any new thoughts on the Graph API? Maybe input
argument shouldn't have a default value, and input would be created only if None
is passed?
Also, are there any plans to implement a base class for multi-io layers?
I assume that if they were to be implemented in Keras, they would have an outputs
property (being a list
of Layers
) just as Graph
does?
Here are a few things I think would be valuable to add to Keras in the near future.
These are mere suggestions. You are welcome to discuss them, contest them, add your own ideas... I would like the development of Keras to be increasingly driven by the community. I don't want to be a bottleneck of development.
I won't be writing code myself, but I will happily give feedback and advice to any contributor who wants to tackle some of these features. I will also keep reviewing and merging PRs.
Here's a list of features. Their focus is on simplicity and user experience.
Input
layers as part of the above, to define expected input data dimensions.add_input
andadd_output
methods, which are cumbersome. A singleadd
method should suffice.