keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.16k stars 19.49k forks source link

Spring 2017 roadmap: Keras 2, PR freeze, TF integration #5299

Closed fchollet closed 6 years ago

fchollet commented 7 years ago

Hi all,

Some news.

PR freeze

We are preparing the release of Keras 2, as well as the integration of the Keras API directly into the TensorFlow repository. Subsequently, we are declaring a PR freeze on Keras, to be lifted after the release of Keras 2. This means that no further PR to Keras 1 will be merged (or even reviewed). However, PRs to the Keras 2 branch (when it becomes available) are welcome.

Keras 2

We plan on making available a Keras 2 branch in the next few days, with a final release in the next few weeks.

Keras 2 will consist in some refactoring, a lot of API changes, and few functionality changes. There are many places in which the Keras 1 API was not optimal, differed from industry standards such as those set by TensorFlow or Numpy, or could otherwise be improved. We bundle API changes in a single release, so that users will only have to update their code once and for all.

TF integration

The Keras 2 API will become part of the TensorFlow repository, to serve as a high-level API for TensorFlow. Concretely:

Dapid commented 7 years ago

Exciting! Do you need any help with the porting?

New, bleeding-edge functionality should preferably go to Keras contrib.

It would be nice to have a rough criteria and perhaps a few examples on what should go to contrib and what should go to Keras proper.

RamaneekGill commented 7 years ago

Will this PR freeze affect docstring improvements?

Also with the release of Keras 2 would it be a good idea to greatly reduce the number of tickets and implement a system/process that prevents or redirects general debugging questions to gitter, slack channel, or stackoverflow? From what I've seen most of the issues on this repo are implementation clarifications, general deep learning questions, and debugging help.

As for the keras spec when it is released, will there be a list of TODOs where the community can contribute? I'm very excited!

matt-gardner commented 7 years ago

Any chance that masking can get first-class support in the Keras 2.0 spec? Building complex NLP models with Keras is difficult and bug-prone, because masking is not supported very well. We've had to write a lot of custom layers and override default Keras layers in order to get them to handle masking correctly.

bfelbo commented 7 years ago

@matt-gardner That would be amazing. It's such a pain to build e.g. hierarchical models with masking right now.

patyork commented 7 years ago

@matt-gardner Did you make any pull requests to fix layers you've had issues with (and subsequently fixed)?

matt-gardner commented 7 years ago

Yes, we've submitted some:

https://github.com/fchollet/keras/pull/3218 https://github.com/fchollet/keras/pull/4253 https://github.com/fchollet/keras/pull/4258

But getting no response after trying to submit improvements is pretty demoralizing for submitting future PRs, so we started just overriding Keras layers in our code (e.g., here, a really trivial fix to Highway that makes it work with masking, that wasn't included because masking is an afterthought in the current Keras API).

minkooseo commented 7 years ago

Good news!

If there will be two github repositories, how would you sync pull requests to tf.keras and this repository? Will there be someone applying changes in one repositority to another?

fchollet commented 7 years ago

If there will be two github repositories, how would you sync pull requests to tf.keras and this repository? Will there be someone applying changes in one repositority to another?

The codebases will be different, so there will be no need to replicate pull requests. For API changes, you would send a PR to the API spec itself, and changes to the API spec would be replicated across all codebases.

fchollet commented 7 years ago

@matt-gardner it looks like your issue is simply that some Keras layers do not support masking yet. Is that right?

matt-gardner commented 7 years ago

@fchollet it's just a plea to please take masking very seriously when thinking about the Keras 2.0 spec. It's crucial for complex NLP, and some pretty basic building blocks of NLP models in Keras don't support masking correctly (e.g., the Embedding layer, and the TimeDistributed layer, as pointed out in PRs I've already linked to). Additionally, almost none of the backend operations deal with masks. This is fine in some cases, but if you want to compute a softmax with a mask, for instance, you have to write your own code. This makes doing attentions over padded word sequences hard, and probably most implementations of attention in Keras are wrong because of this - if you apply the mask after a softmax, as done in this re-implementation of a popular paper, it's wrong, because your distribution wasn't normalized correctly, and it's not obvious that it's wrong from looking at the code.

There's also very little documentation about masking. It's in the background and easy to forget about. But you can't forget about it when doing NLP, or you're doing it wrong. It really needs to be treated as a fundamental component to any static computation graph applied to NLP tasks. The difficulty here is why people choose DyNet over Keras for NLP. There's a whole lot to like about Keras - it'd be nice if were also really good for NLP.

ahundt commented 7 years ago

https://github.com/farizrahman4u/keras-contrib is the Keras Contrib repository described above

yassersouri commented 7 years ago

will you be creating a keras organization in github?

EderSantana commented 7 years ago

make graph visualization in TensorBoard great again, please! This is a feature request I honestly don't know how to solve myself. I find Keras to make the graph tab on tensorboard hard to read

MoyanZitto commented 7 years ago

Exciting! This is really a big news. I hope I can mak contributions to Keras 2! BTW, here are some possible adjustment in Keras 2.

Looking forward to the age of Keras 2!

farizrahman4u commented 7 years ago

I have personally felt that Keras leans more towards image stuff rather than nlp. I can't pin point why exactly I "feel" so; limited support for masking is definitely one of the factors..

braingineer commented 7 years ago

@matt-gardner - I have most of the layers implemented with masking on a personal branch of keras and some custom layers like a LambdaMask layer and a couple variants of the TimeDistributed (which I call "Distribute" because I felt it fit the narrative better).

but I never had the time to write the proper tests to get things back into the main branch. And b/c of previous projects being already done in Theano, I'm just now diving into Tensorflow (thanks, Fold), so couldn't contribute anything towards that backend.

@fchollet some things I've been wanting to push in:

Also, it would be nice if there was a way that you could attach meta information about masks into the graph to have it check for consistency. For example, I know I want certain shaped masks coming out and into certain layers. Right now, I have to go to a jupyter notebook, pull out all the shape tensors through keras' graph, and then evaluate on some data against expected sizes.

If you are looking into rewriting or fleshing out NLP data manip tools (Vocabulary management classes, making word embedding matrices from dataset and a word2vec/GloVe tensor, a data server mostly tuned to Keras's fit generator function). I have a suite of things I'm starting to push out of projects and into more full blown sharing.

Also, thanks for all of your (and everyone else's) awesome work =). I am really excited about this next iteration.

ParthaEth commented 7 years ago

I would also like to draw attention to the fact that building custom RNN architecture is next to impossible in Keras without nasty hacks. Please have a look at this discussion for details. Because of this reason people are forced to create repositories like RecurrentShop. It would be nice to have some official attention on making life easier for RNN researchers.

ahundt commented 7 years ago

In addition to the comments of @ParthaEth the same is true for reinforcement learning problems, loading images via tensorflow tensors #5356, and semantic segmentation seems to be second class. One example is keras-rl.

I don't expect Keras to handle every possible design and problem, but I think it is important to point out areas of weakness before LTS API scoping decisions are settled so the appropriate choices can be made explicitly.

fchollet commented 7 years ago

Thanks for the feedback; a lot of the points raised are in fact planned for Keras 2. Some are not, like support for timestep-wise control of RNNs. That can be added later.

With regard to masking, do you have a specific, point-by-point list of feature requests and use cases?

validation_split and samples shuffle, the separation of training data and validation data should be conducted after the dataset get shuffled.

No; in many cases this would lead to user errors, e.g. any case where the data is generated by a sequential process. I've seen many cases where this feature of Keras saved a user from validating on past data when they should been using only future data.

Passing callbacks should allow you to pass a custom progress bar. It currently silently adds this in the training step. It would be best if you could construct the callbacks object outside of the training function.

Nothing prevents you from setting verbose to 0 and passing your own progbar as a callback.

braingineer commented 7 years ago

Thanks for the responses @fchollet! Super appreciative of the work you're putting in!

Nothing prevents you from setting verbose to 0 and passing your own progbar as a callback.

That's true. Isn't there a trade off in this design decision, though? I'm not aware of any documentation stating that setting verbose to 0 has that singular effect (and if it does, then the name verbose doesn't seem precise enough, because it only does 1 thing) and tracing through the code as a sanity check doesn't sound all that fun. It would seem you could add flexibility without loss of ease of use by allowing a CallbackList to be passed into the fit functions and adding a None check here.

(As a side note and not a complete suggestion, it's easy to imagine Metrics and Objectives in this way, too. Allow for an object to be passed in if desired, default behavior if not. Messy logic for the respective bits could be folded in and hidden a bit more behind an easier to read API)

Thanks again for your work!

patyork commented 7 years ago

I might be missing something, but in all of the docs for functions that accept verbose, the following description is present:

verbose: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.

Also, both fit and fit_generator already take a parameter called callbacks:

callbacks: list of keras.callbacks.Callback instances. List of callbacks to apply during training. See callbacks.

braingineer commented 7 years ago

verbose: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.

Woops. Missed that part.

Also, both fit and fit_generator already take a parameter called callbacks:

It adds 2 callbacks without your control. Check my link.

patyork commented 7 years ago

That's true. What do you propose fit/fit generator return, then, if there's no guarantee of a History object?

-----Original Message----- From: "Brian McMahan" notifications@github.com Sent: ‎2/‎10/‎2017 6:51 PM To: "fchollet/keras" keras@noreply.github.com Cc: "Pat York" pat.york@nevada.unr.edu; "Comment" comment@noreply.github.com Subject: Re: [fchollet/keras] Spring 2017 roadmap: Keras 2, PR freeze, TFintegration (#5299)

Thanks for the responses @fchollet! Super appreciative of the work you're putting in! Nothing prevents you from setting verbose to 0 and passing your own progbar as a callback. That's true. Isn't there a trade off in this design decision, though? I'm not aware of any documentation stating that setting verbose to 0 has that singular effect (and if it does, then the name verbose doesn't seem precise enough, because it only does 1 thing) and tracing through the code as a sanity check doesn't sound all that fun. It would seem you could add flexibility without loss of ease of use by allowing a CallbackList to be passed into the fit functions and adding a None check here. (As a side note and not a complete suggestion, it's easy to imagine Metrics and Objectives in this way, too. Allow for an object to be passed in if desired, default behavior if not. Messy logic for the respective bits could be folded in and hidden a bit more behind an easier to read API) Thanks again for your work! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

laubonghaudoi commented 7 years ago

Will Keras2 support PyTorch as backend, in the future?

fchollet commented 7 years ago

Will Keras2 support PyTorch as backend, in the future?

No, there are no plans to support PyTorch. There is nothing to be gained in supporting every novelty framework that crops up every quarter. Our goal is to make deep learning accessible and useful to as many people as possible, and that goal is completely opposite to building up deep learning hipster cred.

braingineer commented 7 years ago

re: @patyork

That's true. What do you propose fit/fit generator return, then, if there's no guarantee of a History object?

If it were to go this way, I'd imagine something like:

def fit_whichever_version(foo, bar, baz, callbacks=None, ...):
    # stuff

    if callbacks is None:
        # standard history + whichever verbose flag
        callbacks = cbks.CallbackManager.default(verbose=verbose)
    elif not isinstance(callbacks, cbks.CallbackManager):
        # handle list of strings, list of Callback instances, etc.
        callbacks = cbks.CallbackManager.from_kwargs(callbacks, verbose=verbose)
    else:
        assert isinstance(callbacks, cbks.CallbackManager)  # verbose is redundant in this case

    # more stuff

    return callbacks

so, if history were in callbacks, it'd be returned. otherwise the thing returned is exactly what the user expects: the thing they passed in w/ observations of training info.

But in the end, it doesn't matter all that much. Anyone who cares (I like it) will just hack it in there, and people who don't will never notice.

patyork commented 7 years ago

I don't understand the impetus for this at all, and I don't like the fact that it instantly changes the behaviour for anyone using custom callbacks (e.g. model.fit(....., verbose=1, callbacks=[ModelCheckpointer()]) now returns nothing, and has no progress bar). If the CPU cycles it takes for the BaseLogger and curating the History object is too much overhead for you, you probably shouldn't be using Keras with all of its other overhead anyways.

braingineer commented 7 years ago

I don't like the fact that it instantly changes the behaviour for anyone using custom callbacks (e.g. model.fit(....., verbose=1, callbacks=[ModelCheckpointer()])

That was never really implied. There's nothing against having the CallbackManager, by default, enact a history and logger. In fact, I explicitly passed in the verbose in my example.

If the CPU cycles it takes for the BaseLogger and curating the History object is too much overhead for you, you probably shouldn't be using Keras with all of its other overhead anyways.

No need to get snide. It has nothing to do with CPU cycles.

patyork commented 7 years ago

Nothing snide about it - just confused as to the impetus of removing the BaseLogger and History callbacks as defaults/always available. The only thing I could think of was a possible performance hit.

braingineer commented 7 years ago

Sorry --- it came off a bit harsh. Probably a tone of voice thing.

I've written some progress bars that use tqdm, for example. I've never overridden the History callback. My example was a possible solution to "find a way to let the user be able to completely specify their callback env, but also make it transparently the same for anyone who doesn't care".

patyork commented 7 years ago

Yeah, fair enough - several people have written better progress bars, esp for use with Jupyter notebooks.

My point here is that there isn't really a backwards compatible way to do the above - if we get rid of the 2 default callbacks (3 if you consider the verbose flag), anyone using a custom callbacks list will have their code change behavior (because callbacks will not be None, but it will also not include BaseLogger or a History callback). If we don't get rid of the two defaults, well, we've really made no change.

Kind of a moot point. We'll see where it stands once v2 is at a code-complete state, and we start thinking about backwards compatibility.

braingineer commented 7 years ago

My point here is that there isn't really a backwards compatible way to do the above - if we get rid of the 2 default callbacks (3 if you consider the verbose flag), anyone using a custom callbacks list will have their code change behavior (because callbacks will not be None, but it will also not include BaseLogger or a History callback). If we don't get rid of the two defaults, well, we've really made no change.

(bolded for my emphasis.) I've said this already, but that was never really implied. There's nothing against having the CallbackManager, by default, enact a history and logger. The only time anything will break is if people were expecting history to be returned.

Kind of a moot point.

Agreed. why I backed off 3 posts ago.

matt-gardner commented 7 years ago

@fchollet here is a list of the masking requests I can think of right now. I might add more later:

We have solutions for a lot of these problems in our codebase that we can contribute, though it's all based on Keras 1.*, and I'm not sure how much will change in Keras 2. Either way, I'm happy to help contribute to fixing these issues. I would really like to see Keras succeed in being great for NLP.

pengpaiSH commented 7 years ago

I am mainly looking forward to the fit_distributed() which could automatically use multiple GPUs promised by @fchollet months ago : )

fchollet commented 7 years ago

An equivalent to K.softmax that takes a mask as input is needed

Feel free to add a Softmax layer to core. Also add a warning when people are using an Activation layer set to softmax with a mask being passed.

The Embedding layer should work for higher-order inputs.

That's planned.

TimeDistributed needs to pass the mask through to the layer that it wraps.

That's planned.

The Lambda layer should support masking

That's planned.

It'd be nice if you could consistently call K.int_shape() on masks. This is not the case in the theano backend.

This will always not be the case with Theano. Having full offline shape inference capability with Theano would a pretty big amount of work, and out-of-scope for Keras (since it should actually be a Theano feature).

Backend functions need to handle masks.

Masking is Keras-level, not backend-level. That won't change.

Some high-level documentation about masking would be really nice

Sure. Maybe a FAQ entry. If you have any initial proposal, feel free to submit a PR.

All layers should document their masking behavior (expected input shape and output shape, etc.), just like they document their input/output behavior.

Easy to add. After the documentation is updated for Keras 2, feel free to submit a PR.

fit_distributed()

Won't do. Distributed training will be handled via tf.Experiment, which you will be able to instantiate from a Keras model.

pengpaiSH commented 7 years ago

@fchollet Thank you for your feedback and comments. Since lots of people in Keras are discussing Multi-GPUs utilization, could you please say a fewer more words about tf.Experiment?

patyork commented 7 years ago

My only refactor for Keras2 would be around the dim_ordering stuff. Some ideas:

Edited to add bullet 3 and add some more thoughts. Apologies if it's rambling - hopefully you get the gist.

crockpotveggies commented 7 years ago

Hi @fchollet, I've just written a prototype for Keras using a Deeplearning4j backend. After completing this experiment, I've learned a lot about the design of Keras and pluggability of the framework.

Since a rewrite is already on the table, I am wondering if there are plans to make the backend more modular? In other words, do you have plans for a backend to handle more of the actual execution and give more granular control?

For example, Deeplearning4j runs in the JVM and bridges with the ND4J binary. In some cases, it is more advantageous and performant for DL4J to directly handle most of what happens for a fit() or evaluate() operation. This is partly to avoid creating dual references in Python and the JVM (using py4j to bridge the two environments).

The idea is that Keras is a more advanced "string argument" generator that creates a standard for model config and instructing the backend on what to execute. The DL4J experiment has already done this at a core level, and I believe there are some performance gains to be made.

crockpotveggies commented 7 years ago

FYI if you want to check out the experiment: https://github.com/crockpotveggies/dl4j-examples/tree/keras-examples/dl4j-keras-examples

Thanks to @pkoperek for the ideas to organize and hijack the Keras methods to simply bridge it to the JVM.

leriomaggio commented 7 years ago

Hi there. Couple of questions:

  1. @fchollet is there any guide describing the API changes, namely "what's changed", "what will be deprecated", "what will stay unchanged".. and so on? If not, is there any plan to do so? I would be happy to help/contribute to the documentation about this - imho it would really help the transition.

  2. Is the keras-contrib repo mentioned and referenced in this conversation (by @farizrahman4u) the official one? Is there any plan to integrate it as a Keras Branch/module once Keras 2.0 will be released? I'm asking this also because I probably spotted a couple of cases in which Keras 1.X API have been used...

Cheers

farizrahman4u commented 7 years ago

Is the keras-contrib repo mentioned and referenced in this conversation (by @farizrahman4u) the official one?

Yes, it will be moved to Keras organization in the future. If any of the code breaks when Keras 2 is launched, it will be fixed by the maintainers. Else, each of the source files will be converted to the latest API passively.

pengpaiSH commented 7 years ago

@farizrahman4u Is Keras 2 ready now?

singlas commented 7 years ago

https://blog.keras.io/introducing-keras-2.html

23pointsNorth commented 7 years ago

I have another general plea. If Keras 2 will become part of TF, can we please have a replication of the TF layers as keras 2 ones.

For instance, it has been months before any attention was given to #4457 (Deconvolution3D/Conv3DTranspose), albeit it being part of the layers supported by TF for a while (and used by anyone doing any 3D networks). Or somehow feature replicate any of the layers that are 1 and 2D (which by looking at the documentation is effectively the only such layer lacking).

olalonde commented 7 years ago

Concretely, if I want to use the Tensorflow backend, is it better to import keras ("Keras is no longer a library, but rather a spec"), import tensorflow.contrib.keras (marked as deprecated but still in the docs) or import tensorflow.keras (not documented)? Confused 😕

fchollet commented 7 years ago

Unless you have a specific reason to use tf.keras (e.g. a workflow that is mostly TF and where you do not wish to involve 3rd-party libraries), then use Keras. Else, use tf.keras (available as of TF 1.4).

On 23 October 2017 at 15:53, Oli Lalonde notifications@github.com wrote:

Concretely, if I want to use the Tensorflow backend, is it better to import keras ("Keras is no longer a library, but rather a spec"), import tensorflow.contrib.keras (marked as deprecated https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/keras but still in the docs https://www.tensorflow.org/api_docs/python/tf/contrib/keras) or import tensorflow.keras (not documented)? Confused 😕

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/5299#issuecomment-338819852, or mute the thread https://github.com/notifications/unsubscribe-auth/AArWbzxFsULM-LPJrY5yUCRlA8udFVn1ks5svRjagaJpZM4L4336 .