Open varisd opened 6 years ago
I think it is possible with the current implementation. Recurrent encoder can have multiple input sequences (called factors in out code) which should be sequences of the same length. They are concatenated and fed into an RNN. Is that what you need? Factored decoding is not implemented,
If you implement the character-level representation layer inheriting TemporalStateful
and as an additional factor in the encoder. Recently, I did something similar for a different project, so I can help you with that.
Here's an idea that has been on my mind for some time:
Currently NMonkey supports encoding-decoding of sequences of tokens [batch_size, max_seq_len], which is then represented in embedding space as [batch_size, max_seq_len, emb_size]. Let's call these 1D sequences.
However it would be nice, to be able to represent "multidimensional" sequences.
Example: I want to encode sentence where each word embedding is created from its embedded characters using separate encoder (the embedding of the word is a final state of the encoder output).
The input in this case is a matrix of size [batch_size, max_seq_len, max_word_len] (max_seq_len is the length of sentence measured in number of words). We than create a representation in two steps:
Let's call this a 2D sequence. This can be of course expanded to nD sequences.
Technically, this can be expanded also to decoding (first we decode word representation and then decode sequences of characters based on these representations). However, I think this should be left as a separate issue.
What I would like to discuss here is a reasonable approach towards implementing this. My ideas so far:
A class derived from Vocabulary (e.g. MultiDimVocabulary) -- creates the multidimensional representations of the sentences (based on the input separators). -- Handles padding in each dimension.
A Sequence class derivation (e.g. RecursiveSequence) -- has an encoder as an argument (probably different decoder/learnable_decoder_params for each layer) -- In each "representation layer" transforms [batch_size, dim1, dim2, ... dimN, emb_size] to [batch_size x dim1 x dim2 x ... dimN-1, dimN, emb_size1], computes embeddings based in the N-th dimensions and returns reshaped [batch_size, dim1, dim2, ..., dimN-1, emb_size2]. Emb_size1 and emb_size2 may vary. The recursion stops when we get to the level of [batch_size, dim1, emb_size], then we can use an encoder in the classic way.
I am currently sure whether to perform the "recursive" encoding in the sequence or rather define a decoder class for this. I am also aware that some changes to the sentence reader will be necessary.
Before I will start working on this I would like to have a discussion on the topic to get some insights. Feel free to contribute with any ideas.