Closed fchollet closed 6 years ago
Exciting! Do you need any help with the porting?
New, bleeding-edge functionality should preferably go to Keras contrib.
It would be nice to have a rough criteria and perhaps a few examples on what should go to contrib and what should go to Keras proper.
Will this PR freeze affect docstring improvements?
Also with the release of Keras 2 would it be a good idea to greatly reduce the number of tickets and implement a system/process that prevents or redirects general debugging questions to gitter, slack channel, or stackoverflow? From what I've seen most of the issues on this repo are implementation clarifications, general deep learning questions, and debugging help.
As for the keras spec when it is released, will there be a list of TODOs where the community can contribute? I'm very excited!
Any chance that masking can get first-class support in the Keras 2.0 spec? Building complex NLP models with Keras is difficult and bug-prone, because masking is not supported very well. We've had to write a lot of custom layers and override default Keras layers in order to get them to handle masking correctly.
@matt-gardner That would be amazing. It's such a pain to build e.g. hierarchical models with masking right now.
@matt-gardner Did you make any pull requests to fix layers you've had issues with (and subsequently fixed)?
Yes, we've submitted some:
https://github.com/fchollet/keras/pull/3218 https://github.com/fchollet/keras/pull/4253 https://github.com/fchollet/keras/pull/4258
But getting no response after trying to submit improvements is pretty demoralizing for submitting future PRs, so we started just overriding Keras layers in our code (e.g., here, a really trivial fix to Highway
that makes it work with masking, that wasn't included because masking is an afterthought in the current Keras API).
Good news!
If there will be two github repositories, how would you sync pull requests to tf.keras and this repository? Will there be someone applying changes in one repositority to another?
If there will be two github repositories, how would you sync pull requests to tf.keras and this repository? Will there be someone applying changes in one repositority to another?
The codebases will be different, so there will be no need to replicate pull requests. For API changes, you would send a PR to the API spec itself, and changes to the API spec would be replicated across all codebases.
@matt-gardner it looks like your issue is simply that some Keras layers do not support masking yet. Is that right?
@fchollet it's just a plea to please take masking very seriously when thinking about the Keras 2.0 spec. It's crucial for complex NLP, and some pretty basic building blocks of NLP models in Keras don't support masking correctly (e.g., the Embedding
layer, and the TimeDistributed
layer, as pointed out in PRs I've already linked to). Additionally, almost none of the backend operations deal with masks. This is fine in some cases, but if you want to compute a softmax with a mask, for instance, you have to write your own code. This makes doing attentions over padded word sequences hard, and probably most implementations of attention in Keras are wrong because of this - if you apply the mask after a softmax, as done in this re-implementation of a popular paper, it's wrong, because your distribution wasn't normalized correctly, and it's not obvious that it's wrong from looking at the code.
There's also very little documentation about masking. It's in the background and easy to forget about. But you can't forget about it when doing NLP, or you're doing it wrong. It really needs to be treated as a fundamental component to any static computation graph applied to NLP tasks. The difficulty here is why people choose DyNet over Keras for NLP. There's a whole lot to like about Keras - it'd be nice if were also really good for NLP.
https://github.com/farizrahman4u/keras-contrib is the Keras Contrib repository described above
will you be creating a keras organization in github?
make graph visualization in TensorBoard great again, please! This is a feature request I honestly don't know how to solve myself. I find Keras to make the graph tab on tensorboard hard to read
Exciting! This is really a big news. I hope I can mak contributions to Keras 2! BTW, here are some possible adjustment in Keras 2.
Looking forward to the age of Keras 2!
I have personally felt that Keras leans more towards image stuff rather than nlp. I can't pin point why exactly I "feel" so; limited support for masking is definitely one of the factors..
@matt-gardner - I have most of the layers implemented with masking on a personal branch of keras and some custom layers like a LambdaMask layer and a couple variants of the TimeDistributed (which I call "Distribute" because I felt it fit the narrative better).
but I never had the time to write the proper tests to get things back into the main branch. And b/c of previous projects being already done in Theano, I'm just now diving into Tensorflow (thanks, Fold), so couldn't contribute anything towards that backend.
@fchollet some things I've been wanting to push in:
Also, it would be nice if there was a way that you could attach meta information about masks into the graph to have it check for consistency. For example, I know I want certain shaped masks coming out and into certain layers. Right now, I have to go to a jupyter notebook, pull out all the shape tensors through keras' graph, and then evaluate on some data against expected sizes.
If you are looking into rewriting or fleshing out NLP data manip tools (Vocabulary management classes, making word embedding matrices from dataset and a word2vec/GloVe tensor, a data server mostly tuned to Keras's fit generator function). I have a suite of things I'm starting to push out of projects and into more full blown sharing.
Also, thanks for all of your (and everyone else's) awesome work =). I am really excited about this next iteration.
I would also like to draw attention to the fact that building custom RNN architecture is next to impossible in Keras without nasty hacks. Please have a look at this discussion for details. Because of this reason people are forced to create repositories like RecurrentShop. It would be nice to have some official attention on making life easier for RNN researchers.
In addition to the comments of @ParthaEth the same is true for reinforcement learning problems, loading images via tensorflow tensors #5356, and semantic segmentation seems to be second class. One example is keras-rl.
I don't expect Keras to handle every possible design and problem, but I think it is important to point out areas of weakness before LTS API scoping decisions are settled so the appropriate choices can be made explicitly.
Thanks for the feedback; a lot of the points raised are in fact planned for Keras 2. Some are not, like support for timestep-wise control of RNNs. That can be added later.
With regard to masking, do you have a specific, point-by-point list of feature requests and use cases?
validation_split and samples shuffle, the separation of training data and validation data should be conducted after the dataset get shuffled.
No; in many cases this would lead to user errors, e.g. any case where the data is generated by a sequential process. I've seen many cases where this feature of Keras saved a user from validating on past data when they should been using only future data.
Passing callbacks should allow you to pass a custom progress bar. It currently silently adds this in the training step. It would be best if you could construct the callbacks object outside of the training function.
Nothing prevents you from setting verbose
to 0
and passing your own progbar as a callback.
Thanks for the responses @fchollet! Super appreciative of the work you're putting in!
Nothing prevents you from setting
verbose
to0
and passing your own progbar as a callback.
That's true. Isn't there a trade off in this design decision, though? I'm not aware of any documentation stating that setting verbose
to 0 has that singular effect (and if it does, then the name verbose
doesn't seem precise enough, because it only does 1 thing) and tracing through the code as a sanity check doesn't sound all that fun. It would seem you could add flexibility without loss of ease of use by allowing a CallbackList
to be passed into the fit functions and adding a None check here.
(As a side note and not a complete suggestion, it's easy to imagine Metrics and Objectives in this way, too. Allow for an object to be passed in if desired, default behavior if not. Messy logic for the respective bits could be folded in and hidden a bit more behind an easier to read API)
Thanks again for your work!
I might be missing something, but in all of the docs for functions that accept verbose
, the following description is present:
verbose: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.
Also, both fit
and fit_generator
already take a parameter called callbacks
:
callbacks: list of keras.callbacks.Callback instances. List of callbacks to apply during training. See callbacks.
verbose: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.
Woops. Missed that part.
Also, both fit and fit_generator already take a parameter called callbacks:
It adds 2 callbacks without your control. Check my link.
That's true. What do you propose fit/fit generator return, then, if there's no guarantee of a History object?
-----Original Message----- From: "Brian McMahan" notifications@github.com Sent: 2/10/2017 6:51 PM To: "fchollet/keras" keras@noreply.github.com Cc: "Pat York" pat.york@nevada.unr.edu; "Comment" comment@noreply.github.com Subject: Re: [fchollet/keras] Spring 2017 roadmap: Keras 2, PR freeze, TFintegration (#5299)
Thanks for the responses @fchollet! Super appreciative of the work you're putting in! Nothing prevents you from setting verbose to 0 and passing your own progbar as a callback. That's true. Isn't there a trade off in this design decision, though? I'm not aware of any documentation stating that setting verbose to 0 has that singular effect (and if it does, then the name verbose doesn't seem precise enough, because it only does 1 thing) and tracing through the code as a sanity check doesn't sound all that fun. It would seem you could add flexibility without loss of ease of use by allowing a CallbackList to be passed into the fit functions and adding a None check here. (As a side note and not a complete suggestion, it's easy to imagine Metrics and Objectives in this way, too. Allow for an object to be passed in if desired, default behavior if not. Messy logic for the respective bits could be folded in and hidden a bit more behind an easier to read API) Thanks again for your work! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Will Keras2 support PyTorch as backend, in the future?
Will Keras2 support PyTorch as backend, in the future?
No, there are no plans to support PyTorch. There is nothing to be gained in supporting every novelty framework that crops up every quarter. Our goal is to make deep learning accessible and useful to as many people as possible, and that goal is completely opposite to building up deep learning hipster cred.
re: @patyork
That's true. What do you propose fit/fit generator return, then, if there's no guarantee of a History object?
If it were to go this way, I'd imagine something like:
def fit_whichever_version(foo, bar, baz, callbacks=None, ...):
# stuff
if callbacks is None:
# standard history + whichever verbose flag
callbacks = cbks.CallbackManager.default(verbose=verbose)
elif not isinstance(callbacks, cbks.CallbackManager):
# handle list of strings, list of Callback instances, etc.
callbacks = cbks.CallbackManager.from_kwargs(callbacks, verbose=verbose)
else:
assert isinstance(callbacks, cbks.CallbackManager) # verbose is redundant in this case
# more stuff
return callbacks
so, if history were in callbacks, it'd be returned. otherwise the thing returned is exactly what the user expects: the thing they passed in w/ observations of training info.
But in the end, it doesn't matter all that much. Anyone who cares (I like it) will just hack it in there, and people who don't will never notice.
I don't understand the impetus for this at all, and I don't like the fact that it instantly changes the behaviour for anyone using custom callbacks (e.g. model.fit(....., verbose=1, callbacks=[ModelCheckpointer()])
now returns nothing, and has no progress bar). If the CPU cycles it takes for the BaseLogger and curating the History object is too much overhead for you, you probably shouldn't be using Keras with all of its other overhead anyways.
I don't like the fact that it instantly changes the behaviour for anyone using custom callbacks (e.g. model.fit(....., verbose=1, callbacks=[ModelCheckpointer()])
That was never really implied. There's nothing against having the CallbackManager, by default, enact a history and logger. In fact, I explicitly passed in the verbose
in my example.
If the CPU cycles it takes for the BaseLogger and curating the History object is too much overhead for you, you probably shouldn't be using Keras with all of its other overhead anyways.
No need to get snide. It has nothing to do with CPU cycles.
Nothing snide about it - just confused as to the impetus of removing the BaseLogger and History callbacks as defaults/always available. The only thing I could think of was a possible performance hit.
Sorry --- it came off a bit harsh. Probably a tone of voice thing.
I've written some progress bars that use tqdm, for example. I've never overridden the History
callback. My example was a possible solution to "find a way to let the user be able to completely specify their callback env, but also make it transparently the same for anyone who doesn't care".
Yeah, fair enough - several people have written better progress bars, esp for use with Jupyter notebooks.
My point here is that there isn't really a backwards compatible way to do the above - if we get rid of the 2 default callbacks (3 if you consider the verbose flag), anyone using a custom callbacks list will have their code change behavior (because callbacks
will not be None
, but it will also not include BaseLogger or a History callback). If we don't get rid of the two defaults, well, we've really made no change.
Kind of a moot point. We'll see where it stands once v2 is at a code-complete state, and we start thinking about backwards compatibility.
My point here is that there isn't really a backwards compatible way to do the above - if we get rid of the 2 default callbacks (3 if you consider the verbose flag), anyone using a custom callbacks list will have their code change behavior (because callbacks will not be None, but it will also not include BaseLogger or a History callback). If we don't get rid of the two defaults, well, we've really made no change.
(bolded for my emphasis.) I've said this already, but that was never really implied. There's nothing against having the CallbackManager, by default, enact a history and logger. The only time anything will break is if people were expecting history
to be returned.
Kind of a moot point.
Agreed. why I backed off 3 posts ago.
@fchollet here is a list of the masking requests I can think of right now. I might add more later:
Embedding
layer should work for higher-order inputs. Imagine a sentence represented as characters, for instance, and you want to embed each of the characters and then run a character-level encoder over each of the words. Your input would be (batch_size, num_words, num_characters_per_word)
. Embedding
doesn't currently work correctly with this input. There are lots of similar situations where you have higher-order word or character input, and none of them work correctly without modifying Embedding
.TimeDistributed
needs to pass the mask through to the layer that it wraps. Additionally, there should be several subclasses available for handling compute_mask
in different ways. For example, imagine the sentence representation from above. If I want to TimeDistribute
a CNN encoder, applying it to each of the words, the mask I want to compute is basically K.any
on the mask for each timestep, so that the output mask tells me which whole words were masked. If I then want to take those word representations and pass them through a Highway
layer, I need to TimeDistribute
the Highway
layer over the number of words, because the tensor is (batch_size, num_words, encoding_dim)
. In this case, I want TimeDistributed
to just pass through the mask. In still other cases, I might want to pass the computation of compute_mask
to the wrapped layer, and join them afterwards. It's possible that you could capture all three of these use cases with just the last one, but it would probably take some complex logic to do so, in addition to modifying the behavior of compute_mask
in wrapped layers (e.g., LSTM
doesn't currently return a mask at all in the return_sequences=False
case, and it would need to return either a 0 or a 1 for this to work).K.softmax
that takes a mask as input is needed. Any time you want to compute a softmax over something that's padded, you need this. The most obvious use case is attentions over word sequences, but there are others, too. You could solve this by adding another backend function, or just by adding a Softmax
layer that handles masking (which will in the end also need another backend function, or just its own code that uses backend functions).Lambda
layer should support masking, as @braingineer said above.K.batch_dot
. If you want to implement bidirectional attention flow, you need to compute a similarity matrix that then gets passed through a couple of different softmaxes. As I already said above, the softmax needs to treat a mask correctly, so the operation that you did to compute the similarity matrix needs to propagate a correct mask (or you have to create one huge function, which prohibits re-using the similarity matrix in several downstream layers). So, we need a K.batch_dot
that propagates a mask. Similar to what I said for K.softmax
, you could either do this with another backend function, or you just add a BatchedDot
layer that handles the mask correctly. In general, it seems useful to have layers associated with most backend functions that do the correct masking computation (this may not be necessary for all of them, especially if the Lambda
layer supports masking and passes through the mask by default).Model
, and in what situations you might want to use a mask.K.int_shape()
on masks. This is not the case in the theano backend.We have solutions for a lot of these problems in our codebase that we can contribute, though it's all based on Keras 1.*, and I'm not sure how much will change in Keras 2. Either way, I'm happy to help contribute to fixing these issues. I would really like to see Keras succeed in being great for NLP.
I am mainly looking forward to the fit_distributed()
which could automatically use multiple GPUs promised by @fchollet months ago : )
An equivalent to K.softmax that takes a mask as input is needed
Feel free to add a Softmax
layer to core. Also add a warning when people are using an Activation
layer set to softmax with a mask being passed.
The Embedding layer should work for higher-order inputs.
That's planned.
TimeDistributed needs to pass the mask through to the layer that it wraps.
That's planned.
The Lambda layer should support masking
That's planned.
It'd be nice if you could consistently call K.int_shape() on masks. This is not the case in the theano backend.
This will always not be the case with Theano. Having full offline shape inference capability with Theano would a pretty big amount of work, and out-of-scope for Keras (since it should actually be a Theano feature).
Backend functions need to handle masks.
Masking is Keras-level, not backend-level. That won't change.
Some high-level documentation about masking would be really nice
Sure. Maybe a FAQ entry. If you have any initial proposal, feel free to submit a PR.
All layers should document their masking behavior (expected input shape and output shape, etc.), just like they document their input/output behavior.
Easy to add. After the documentation is updated for Keras 2, feel free to submit a PR.
fit_distributed()
Won't do. Distributed training will be handled via tf.Experiment
, which you will be able to instantiate from a Keras model.
@fchollet Thank you for your feedback and comments. Since lots of people in Keras are discussing Multi-GPUs utilization, could you please say a fewer more words about tf.Experiment
?
My only refactor for Keras2 would be around the dim_ordering
stuff. Some ideas:
dim_ordering
can affect speed greatly - this is especially true for TF users who've read the above link and think that setting their ordering to 'th' will boost their speed on NVIDIA GPUs with cuDNN.Edited to add bullet 3 and add some more thoughts. Apologies if it's rambling - hopefully you get the gist.
Hi @fchollet, I've just written a prototype for Keras using a Deeplearning4j backend. After completing this experiment, I've learned a lot about the design of Keras and pluggability of the framework.
Since a rewrite is already on the table, I am wondering if there are plans to make the backend more modular? In other words, do you have plans for a backend to handle more of the actual execution and give more granular control?
For example, Deeplearning4j runs in the JVM and bridges with the ND4J binary. In some cases, it is more advantageous and performant for DL4J to directly handle most of what happens for a fit()
or evaluate()
operation. This is partly to avoid creating dual references in Python and the JVM (using py4j to bridge the two environments).
The idea is that Keras is a more advanced "string argument" generator that creates a standard for model config and instructing the backend on what to execute. The DL4J experiment has already done this at a core level, and I believe there are some performance gains to be made.
FYI if you want to check out the experiment: https://github.com/crockpotveggies/dl4j-examples/tree/keras-examples/dl4j-keras-examples
Thanks to @pkoperek for the ideas to organize and hijack the Keras methods to simply bridge it to the JVM.
Hi there. Couple of questions:
@fchollet is there any guide describing the API changes, namely "what's changed", "what will be deprecated", "what will stay unchanged".. and so on? If not, is there any plan to do so? I would be happy to help/contribute to the documentation about this - imho it would really help the transition.
Is the keras-contrib
repo mentioned and referenced in this conversation (by @farizrahman4u) the official one?
Is there any plan to integrate it as a Keras Branch/module once Keras 2.0 will be released?
I'm asking this also because I probably spotted a couple of cases in which Keras 1.X API have been used...
Cheers
Is the keras-contrib repo mentioned and referenced in this conversation (by @farizrahman4u) the official one?
Yes, it will be moved to Keras organization in the future. If any of the code breaks when Keras 2 is launched, it will be fixed by the maintainers. Else, each of the source files will be converted to the latest API passively.
@farizrahman4u Is Keras 2 ready now?
I have another general plea. If Keras 2 will become part of TF, can we please have a replication of the TF layers as keras 2 ones.
For instance, it has been months before any attention was given to #4457 (Deconvolution3D/Conv3DTranspose), albeit it being part of the layers supported by TF for a while (and used by anyone doing any 3D networks). Or somehow feature replicate any of the layers that are 1 and 2D (which by looking at the documentation is effectively the only such layer lacking).
Concretely, if I want to use the Tensorflow backend, is it better to import keras
("Keras is no longer a library, but rather a spec"), import tensorflow.contrib.keras
(marked as deprecated but still in the docs) or import tensorflow.keras
(not documented)? Confused 😕
Unless you have a specific reason to use tf.keras
(e.g. a workflow that
is mostly TF and where you do not wish to involve 3rd-party libraries),
then use Keras. Else, use tf.keras
(available as of TF 1.4).
On 23 October 2017 at 15:53, Oli Lalonde notifications@github.com wrote:
Concretely, if I want to use the Tensorflow backend, is it better to import keras ("Keras is no longer a library, but rather a spec"), import tensorflow.contrib.keras (marked as deprecated https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/keras but still in the docs https://www.tensorflow.org/api_docs/python/tf/contrib/keras) or import tensorflow.keras (not documented)? Confused 😕
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/5299#issuecomment-338819852, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWbzxFsULM-LPJrY5yUCRlA8udFVn1ks5svRjagaJpZM4L4336 .
Hi all,
Some news.
PR freeze
We are preparing the release of Keras 2, as well as the integration of the Keras API directly into the TensorFlow repository. Subsequently, we are declaring a PR freeze on Keras, to be lifted after the release of Keras 2. This means that no further PR to Keras 1 will be merged (or even reviewed). However, PRs to the Keras 2 branch (when it becomes available) are welcome.
Keras 2
We plan on making available a Keras 2 branch in the next few days, with a final release in the next few weeks.
Keras 2 will consist in some refactoring, a lot of API changes, and few functionality changes. There are many places in which the Keras 1 API was not optimal, differed from industry standards such as those set by TensorFlow or Numpy, or could otherwise be improved. We bundle API changes in a single release, so that users will only have to update their code once and for all.
TF integration
The Keras 2 API will become part of the TensorFlow repository, to serve as a high-level API for TensorFlow. Concretely:
tf.contrib
, later intf.keras
.tf.keras
deeply compatible with other TF functionality. You will be able to mix and match core TF andtf.keras
functionality seamlessly (in effect,tf.keras
is just a TF API, not a separate library). Likewise, you should be able to use Keras models with e.g. TFExperiments
, allowing you to easily train a Keras model in a distributed setting or on CloudML, or do distributed hyperparameter search. By usingtf.keras
, you will benefit from the full power of TensorFlow.fchollet/keras
. It continues to be the "home" of Keras, and Theano support will continue indefinitely. We are not replacing what is already there, rather, we are simply adopting the Keras spec as a built-in high-level API for TF.