Variable-length sequence spaces support

dwf commented 10 years ago

I think we need to centralize efforts on variable length sequences in pylearn2 and really get our act together. I know a lot of people (@bartvm @vdumoulin @laurent-dinh @pascanur) have thought about this problem, and I think we need a centralized record of

what is known about what is needed and how to implement it
any existing partial implementations
outstanding issues that are unsolved even in principle
current Theano limitations blocking us (@bartvm mentioned issues surrounding linear algebra issues surrounding matrices/tensors that are actually vectors, given that all axes except one are 1; I think this can easily be addressed by graph substitution optimizations)
What parts of the code need to change (How much of SGD?)

Obviously mini-batches where examples are of variable length are an issue. You can really only address this with lists of Theano variables at present.

Since one can get a lot done in sequential modelling with batch size 1 (provided your sequence is long) I would suggest structuring a first pass as being sufficiently general to support batch size > 1, but actually only implement as much as is needed to support batch size = 1.

I therefore propose using this ticket as that collaboration space.

vdumoulin commented 10 years ago

Here's my view on it:

I wrote two new Space subclasses called VectorSequenceSpace and IndexSequenceSpace (see here) which represent single sequences as matrices whose first dimension is time.

For now minibatches aren't supported because of a Theano limitation: lists of variable-length arrays have no corresponding data structure in Theano. @nouiz is aware of the issue, and in a discussion with him today I learned he even has an intern currently working on fixing it.

This space could probably be extended to support minibatches of variable-length sequences, as long as sequences in a minibatch have the same length.

There is also a toy-ish implementation of a RNN which I wrote for Yoshua's class and which is used by Junyoung for speech synthesis. It uses XSequenceSpace for its input space, and the TIMIT dataset it is trained on (a class called TIMITSequences) also uses XSequenceSpace. With this setup SGD didn't need any modification. The only thing that might require changing SGD is gradient clipping and gradient norm monitoring, and even that could be implemented as a wrapper Cost.

dwf commented 10 years ago

With this setup SGD didn't need any modification.

That is so, so good to hear.

The only thing that might require changing SGD is gradient clipping and gradient norm monitoring, and even that could be implemented as a wrapper Cost.

Yeah, I could see both Cost and LearningRule having a role in implementing RNN tricks, depending. The situation seems less dire than it sounded earlier today. It'd be good to gather thoughts from @vdumoulin and @laurent-dinh regarding current blockers, etc.

dwf commented 10 years ago

Er, by that I mean @bartvm and @laurent-dinh. I thought that comment was by Bart (I'm really tired).

bartvm commented 10 years ago

I had another quick look, and it looks like the problem I had with batch_size = 1 has been fixed. VectorSpace and IndexSpace will return a row tensor and not a matrix tensor in this case which should lead to Theano using GEMV instead of GEMM. Formerly this caused a bug in monitor.py, so the batch_size information isn't used there anymore (https://groups.google.com/forum/#!topic/pylearn-dev/M-f2YyGxS8c and ticket #634 ), but training time should work fine.

I haven't used @vdumoulin 's spaces, but I guess that they can be used quite flexibly. If you set the number of columns (dim) to 1 you could simply pass your sequence as a column vector, correct?

In the long-term I would love to see a solution that allows a batch size greater than 1 though. In my case, something I would like to do is create a fixed-length continuous representation of variable-length sequences, and then run that through a normal MLP, which isn't very efficient to do one sentence at a time.

Lists of Theano variables are one option, although I have no idea what the performance is like when you have e.g. 250 different Theano variables as an input and need to concatenate them in one of your MLP layers. Another option, although I have no idea if this is feasible at all, is to run the variable-length part of the network first, collect the outputs, and then run the fixed-length part of the network in batch. I'd also be interested to hear what @nouiz and his intern have in mind, and with what timetable. It would be a shame if we start to implement something and then need to rewrite everything once Theano support for variable length batches arrives.

vdumoulin commented 10 years ago

@bartvm right now XSequenceSpace always expects matrices for sequences, but this is a constraint that could very easily be lifted, as well as the minibatch constraint (in this case, all sequences in a minibatch would need to have the same length, but this fixed length could vary across minibatches).

bartvm commented 10 years ago

Well, I meant you could simply pass a n × 1 matrix.

I'm not very clear on what allowing mini-batches would entail though. I guess you'd have to be careful because you would be removing some of the stochasticity from the iterators i.e. sequences with the same length will always be part of the same batch. Also, I guess you would have to fix the mini-batch size to min(|{s in S : ||s|| = const.}|) for sequences S, or otherwise scale your cost function depending on the mini-batch size?

nouiz commented 10 years ago

I think we Will merge the list type next week, without scan support or c code support.

Fred

Le 16 mai 2014 11:28, "Bart" notifications@github.com a écrit :

Well, I meant you could simply pass a n × 1 matrix.

I'm not very clear on what allowing mini-batches would entail though. I guess you'd have to be careful because you would be removing some of the stochasticity from the iterators i.e. sequences with the same length will always be part of the same batch. Also, I guess you would have to fix the mini-batch size to min(|{s in S : ||s|| = const.}|) for sequences S, or otherwise scale your cost function depending on the mini-batch size?

— Reply to this email directly or view it on GitHub.

nouiz commented 10 years ago

I was wrong. It is now merged!

On Fri, May 23, 2014 at 9:02 AM, Frédéric Bastien nouiz@nouiz.org wrote:

I think we Will merge the list type next week, without scan support or c code support.

Fred

Le 16 mai 2014 11:28, "Bart" notifications@github.com a écrit :

Well, I meant you could simply pass a n × 1 matrix.

I'm not very clear on what allowing mini-batches would entail though. I guess you'd have to be careful because you would be removing some of the stochasticity from the iterators i.e. sequences with the same length will always be part of the same batch. Also, I guess you would have to fix the mini-batch size to min(|{s in S : ||s|| = const.}|) for sequences S, or otherwise scale your cost function depending on the mini-batch size?

— Reply to this email directly or view it on GitHub.

bartvm commented 10 years ago

Pull request #1021 implements batches of variable length using padding and a mask. I think this is a better solution than typed lists because it allows us to loop over the time-axis in a batch.

nouiz commented 10 years ago

Your way will be faster if the difference in length isn't too big. But in the case with "big" (need to be defined, will depend of the overhead), doing this PR will be faster.

As currently scan have a big overhead. The definition of big is probably bigger then the sentence length. This also depend of the model.

On Wed, Jul 23, 2014 at 10:07 PM, Bart notifications@github.com wrote:

Pull request #1021 https://github.com/lisa-lab/pylearn2/pull/1021 implements batches of variable length using padding and a mask. I think this is a better solution than typed lists because it allows us to loop over the time-axis in a batch.

— Reply to this email directly or view it on GitHub https://github.com/lisa-lab/pylearn2/issues/909#issuecomment-49959737.

lisa-lab / pylearn2

Variable-length sequence spaces support #909