nicholas-leonard / dp

A deep learning library for streamlining research and development using the Torch7 distribution.
Other
343 stars 139 forks source link

Recurrent Model #13

Closed nicholas-leonard closed 9 years ago

nicholas-leonard commented 10 years ago
daydreamt commented 10 years ago

Hey there, any progress on that?

Otherwise I have great interest in making it happen, but I would probably need some guidance on making it extensible, cleanly designed and support most recurrent models ( I have only had experience with BPTT and Elman Networks, will need to do some reading too).

nicholas-leonard commented 10 years ago

I haven't actually done anything in this respect. But I would be really glad if you take charge on this. I can help make it extensible. Do you have an idea in mind?

daydreamt commented 10 years ago

That's great, I'm on it!

Nothing definite yet, I am doing some reading until Sunday (Saturday night will be spent at airport), and will then implement a standard RNN and a LSTM and see how it goes.

I strongly suspect the current Layer abstractions that have inputs, outputs and gradients pass through them will be enough (or go a very long way) as a building block for most models, and that I will just need to keep multiple of them in history for recurrent networks(so I guess I would need to write the analogue to Sequential to handle it), but I am waiting to find out if that is indeed the case.

I am at this point unsure whether a specialized data class will be needed/helpful.

Will post updates!

nicholas-leonard commented 10 years ago

How is this coming? Do you need some help?

daydreamt commented 10 years ago

Hi there, I have been meaning to post (sorry about the delay).

I was held back a bit because I wasn't sure how to have recurrent connections between layers that aren't consecutive.

So I am working on an arbitrary* network constructor class now that additionaly gets a list of 'connections' (this requires naming the models), and whether they are recurrent, and then takes care of forward and backward passes for the whole network. Ideally, the layer class should keep its previous parameters in itself when it is recurrent (but this can only happen if the forward functions are always only called for good reason).

Does this approach seem sound to you?

The multiple-inputs per model part will require some sort of ListView with named parameters to tell the input models apart. I hope to have done this in a few days(*2), but I wanted to post now to show some signs of life :-).

_well, as long as the whole network structure is a DAG and can be nicely traversed all should be well. (_2) it will probably take longer.

nicholas-leonard commented 10 years ago

I like your second asterisk note.

I don't have much experimence with RNNs, but I am thinking that there are two ways to implement it. The first involves keeping it within its own Layer, encapsulating different nn.Modules. The second is where the RNN Model is a Container of Models. In the first case, I think the input will be a SequenceView (the output of a Dictionary Layer), in the second case a ListView. So I am guessing you are looking to implement the second approach, where the RNN would be a Container of Models.

Myself I was looking into the first approach, as it seem much easier. The Recurrent Layer would implement the Elman Network used by Mikolov. Using two nn.Module (one for feeding its previous state into itself, the other for the current state) encapsulated in a nn.Sequential(nn.ConcatTable -> nn.CAddTable -> nn.Sigmoid). You have a loop that feeds each subtensor along the w dimension to the nn.Sequential. The two nn.Module (which could each be a nn.Linear) could be specified in the Recurrent Layer's constructor.

On the other hand, this may not be what you are looking for...

nicholas-leonard commented 10 years ago

Simple RNN described in a blog post: http://www.nehalemlabs.net/prototype/blog/2013/10/10/implementing-a-recurrent-neural-network-in-python/

Code (very simple! easy to extend): https://gist.github.com/tmramalho/5e8fda10f99233b2370f

This code/post is based on what appears to be a precursor to GroundHog by Razvan: https://github.com/pascanur/trainingRNNs

Alex Graves' paper on speech recognition has a nice straightforward formulation of LSTM: http://www.cs.toronto.edu/~fritz/absps/RNN13.pdf

Breze also has implementations of many of these things (including LSTM) in Theano: https://github.com/breze-no-salt/breze/blob/master/breze/arch/model/sequential/rnn.py

nicholas-leonard commented 10 years ago

https://groups.google.com/forum/embed/?place=forum/torch7#!topic/torch7/ZY4hX4qOvjk

nicholas-leonard commented 9 years ago

Took me 2 weeks, but finally implemented a nn.Recurrent module : https://github.com/clementfarabet/lua---nnx/pull/20