keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.3k stars 19.38k forks source link

Development goals: January-February 2016 #1399

Closed fchollet closed 7 years ago

fchollet commented 8 years ago

Last month, we delivered on our key development goals for the period. Keras has made great strides in code quality and documentation.

Here's an update with our new goals. On one side, we will continue improving the codebase and feature set of Keras. On the other side, we will start focusing more on providing the community with a wealth of real applications, rather than just library features. As deep learning engineering becomes increasingly commoditized (most notably by Keras), Keras needs to move up the ladder of abstraction and start providing value at the application level in order to stay relevant for the next 5 years.

These applications will roughly fall into two categories:

As a closing note, I am noticing that the October-December period, rich in ML conferences, has seen the release of over 15 research papers using Keras for their experiments (plus an unknowable count of papers that used Keras without citing it --a majority of papers never cite the open source frameworks they use). This is a positive sign : )

jfsantos commented 8 years ago

I have some code for music generation and a dataset that I can share as an example (currently not working due to the problems with masking, but this is already on the to-do list). It's similar to char-rnn but predicts musical "tokens" instead of characters.

fchollet commented 8 years ago

@jfsantos sounds great. It would be neat to turn this into a resuable app (e.g. provide a folder with enough MIDI files or audio files in a certain style, and start generating MIDI tracks or audio files in that style). What is the "token" space you were using?

jfsantos commented 8 years ago

The token space I used are ABC notation symbols. They are mostly used for representing music for a single instrument (mostly monophonic, even though there's a notation for chords). I don't know if there are a lot of datasets in this format, but there's the one I used (which contains ~25k tunes).

The code could probably be converted to use MIDI or another format instead of ABC. For other formats, we would need a parser. I considered using the parsers from music21 but that would add an external dependency to the example.

fchollet commented 8 years ago

MIDI would certainly be a better format to allow a wide range of people to play around with it. It's a good starting point. I think the killer app would involve learning from audio files and generating audio files, with some "clean" data representation in between (possibly derived from ABC). Previous attempts have been doing it completely wrong, but we could do it right.

ozancaglayan commented 8 years ago

Regarding masking, I'm trying to implement a feed-forward network using Graph like the following:

Embedding -> Flatten -> Dense -> ...

I'm padding my short sequences with 0 both in input and outputs. If I set mask_zero=True for the embedding layer, the Flatten and Dense layers are broken as they are not supposed to be used with masks. Changing keras/layers/core.py so that they are derived from MaskedLayer instead of Layer makes the system at least train but I'm not sure if the inner parts are nicely playing with the masks. I assume that this wouldn't be so simple to fix this way :)

Sandy4321 commented 8 years ago

may you recommend some paper/video/book/ code examples links to study more about it pls?

On Tue, Jan 5, 2016 at 8:55 AM, Ozan Çağlayan notifications@github.com wrote:

Regarding masking, I'm trying to implement a feed-forward network using Graph like the following:

Embedding -> Flatten -> Dense -> ...

I'm padding my short sequences with 0 both in input and outputs. If I set mask_zero=True for the embedding layer, the Flatten and Dense layers are broken as they are not supposed to be used with masks. Changing keras/layers/core.py so that they are derived from MaskedLayer instead of Layer makes the system at least train but I'm not sure if the inner parts are working correctly with the masks. I assume that this wouldn't be so simple to fix this way :)

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/1399#issuecomment-169007722.

farizrahman4u commented 8 years ago

We need a K.tensordot which mimics theano's batched_tensordot but should also work on tensorflow. Memory networks are impossible without dot merge.

fchollet commented 8 years ago

That's true, but I think we can wait for TensorFlow to implement tensor contraction. Rolling out our own implementation would be inefficient.

On 5 January 2016 at 10:26, Fariz Rahman notifications@github.com wrote:

We need a K.tensordot which mimics theano's batched_tensor_dot but should also work on tensorflow. Memory networks are impossible are impossible without dot merge.

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/1399#issuecomment-169089098.

fchollet commented 8 years ago

may you recommend some paper/video/book/ code examples links to study more about it pls?

Study what, Keras? Here's a pretty good video intro: https://www.youtube.com/watch?v=Tp3SaRbql4k

farizrahman4u commented 8 years ago

Adding new apps is definitely a great step. What I would recommend is start with making the current examples interactive. For e.g, after training babi_memnn, the user should be able to input a story and a question (as natural language text, not word index) and ask questions about it to the model. Instead of each example being a single python file, each should be a folder with sub folders train_data, 'test_data and separate scripts train.py and test.py. This will give absolute control to the user, at the cost of save_weights and load_weights.(train.py saves and test.py loads h5py file). Also, there should be explicit examples for visualization.

meanmee commented 8 years ago

I am really happy for hearing about these things, And I think for researchers, The state-of-the-art performance's models are in need. And I suggest If you want to make the examples interactive, It is better to give the users a GUI version. In my opinion, The guys who use keras are either do research or their business, rather than having fun. Anyway, For me, just a beginner of deep learning for one year, which is almost the same age like keras? It is time to post papers, And I think many people also in the same case. Wishing keras will add some baseline model proposed in the research papers, And I will do some effort as much as I can

antoniosehk commented 8 years ago

I agree. I think people using Keras are mostly for serious stuff (research/business) rather than having fun. I would expect Keras supporting more state-of-the-art models rather than making the examples interactive.

Sandy4321 commented 8 years ago

cool just great but too short may share more links like this,pls

On Tue, Jan 5, 2016 at 2:13 PM, François Chollet notifications@github.com wrote:

may you recommend some paper/video/book/ code examples links to study more about it pls?

Study what, Keras? Here's a pretty good video intro: https://www.youtube.com/watch?v=Tp3SaRbql4k

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/1399#issuecomment-169101046.

farizrahman4u commented 8 years ago

@Sandy4321 That video covers pretty much all the basics. Also checkout the documentation and examples. If you need help with any specific problem, consider opening a new issue.

fchollet commented 8 years ago

Update on our progress so far:

fchollet commented 8 years ago

What blogging platform would you guys suggest for the Keras blog? Requirements:

Maybe we'll end up falling back to Github for content management + S3 for hosting + a custom static site generator. Wouldn't be the first time for me.

Also, what hosting platform would you guys suggest for the (500+MB) weight files of a Keras model Zoo? Hosting on my personal S3 account (as I do for Keras datasets) would be prohibitively expensive.

lukedeo commented 8 years ago

I mean, how many weight files are we expecting? A quick check on the AWS calculator shows that 10GB will run ~64 cent/mo.

fchollet commented 8 years ago

@lukedeo hosting would be inexpensive. It's downloads that are the problem. Keras has around 30k active users, so we could realistically expect several TBs of downloads every month, which would potentially cost hundreds every month.

lukedeo commented 8 years ago

Yikes, I didn't realize Keras was at 30k! I remember reading that rackspace doesn't charge based on bandwidth...might be an option.

jfsantos commented 8 years ago

@fchollet I'm going to test my music generation models this week. It's still based on a textual representation of music but it's a start.

Regarding blogging platforms, I recommend Pelican, a static site generator written in Python and aimed at blogs. There's plenty of templates to choose from and it's fairly easy to write your own. It also has a plugin interface for adding generation of pages (e.g. I have one for generating a list of publications from a BibTeX file). We could host it on Github pages (that's what I do for my website). Here's one of my blog posts using LaTeX rendering and code snippets.

wb14123 commented 8 years ago

What about just use GitHub pages for blog? It can be written with markdown and controlled under git. Jekyll could be the tool to generate it.

wb14123 commented 8 years ago

About the QA system, I'd like to implement with the seq2seq model in this paper. But it seems difficult to implement in Keras since it's not easy to copy the encoder RNN's hidden state to the decoder's. Maybe I can try to train the model in examples/addition_rnn.py with some movie subtitles and see the results.

farizrahman4u commented 8 years ago

@wb14123 http://www.github.com/farizrahman4u/seq2seq

wb14123 commented 8 years ago

@farizrahman4u Thanks. I've found this project before. It is awesome. But it has some custom layers and I don't know if it is a good idea to use that as an example. I think it's better to just stack some exist layers in an example. Maybe merge your layers into the upstream Keras is a good idea?

farizrahman4u commented 8 years ago

@wb14123 As you said, custom layers. They are kind of hackish, and do not work with tensorflow. So I don't think it meets the Keras standards, hence the separate repo.

fchollet commented 8 years ago

@jfsantos thanks for the suggestions. Pelican + Github Pages sounds good, we'll probably do that.

datnamer commented 8 years ago

Suggestions to increase appeal to industry : Integration with blaze for learning across many backends (databases, out of core dataframes etc

Timeseries prediction

Anmol6 commented 8 years ago

Hey, I've been using Keras for a couple of weeks now and I'd like to contribute in some way! I'd love to take on some sort of NLP-related example task. Also, this'd be my first open source project.

farizrahman4u commented 8 years ago

@Anmol6 Try adding multiple hops to the memory network example as mentioned in the paper. Should be a nice start.

Anmol6 commented 8 years ago

@farizrahman4u which paper? and you mean this example: https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py?

farizrahman4u commented 8 years ago

Yes. That one. But as you can see, there is only one memory hop, so it will work only for babi task 1. But if you do multiple hops(3 at least), you can do this:

babi

You can get theano code from https://github.com/npow/MemN2N

Anmol6 commented 8 years ago

I see, I'll try that out. Thanks!

Anmol6 commented 8 years ago

Hey so I'm working on getting the multiple hops done. I'm having trouble figuring out how the code at https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py is employing this step outlined in the paper(if at all):

image

If that's not being used, could you explain the logic behind the model in the code? Thanks!

farizrahman4u commented 8 years ago

Its actually easier than you think. In memory hop1, the output is a function of the question and the story. This is already done in the keras example. In memory hop2, the output is a function of the question, the story and the output of hop1.

pasky commented 8 years ago

@farizrahman4u maybe this should move into a more specific issue but I was also confused about the BaBi example, it's not really obvious to me that it implements memory networks.

The match seems to correspond to pre-softmax p vector, but I don't think there's any weighed sum going on, except if I'm confused by the embedding of memories to query_maxlen-dimensional space that I didn't really understand.

The way I'd reproduce the MemN2N construction in the current framework would be to add softmax activation to match, embed input_encoder_c to 64d, and compute match-weighted sum of input_encoder_c elements by A. RepeatVector(64) the match to be able to dot-product B. dot-product the match and inpute_encoder_c. There shouldn't be any place where LSTM enters at this point as the shape at that point is just (batch, 64)? Does that make sense? If the current construction is somehow equivalent to that, sorry for the noise, it's lost on me though.

However, this wouldn't really reproduce MemN2N anyway since it treats memories at a word level, picking the relevant words rather than relevant sentences, which is the story-to-memory segmentation the memory networks use. For that, we'd have to bump the dimensionality of the input and put each memory in a separate 2d tensor, then either use averaging or RNNs to get memory embeddings (which might be possible with the very latest git I guess?).

(P.S.: I work on a bunch of related Keras models that model sentence similarities (at the core that's what MemNNs do too), e.g. https://github.com/brmson/dataset-sts/blob/master/examples/anssel_kst1503.py but I already have some way more complicated ones (e.g. almost reproducing 1511.04108) in my notebooks that I hope to tweak and publish soon - once my deadlines pass during February, I'll be happy to clean up and contribute them to Keras as examples.)

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

shwetgarg commented 6 years ago

@pasky Though I am very late on this mail thread, but I completely agree that current babi_memnn.py implementation does not treat memory at sentence level. I am trying to implement end to end memory networks and would appreciate if you can share the code that you have written for the same.