uyaseen / theano-recurrence

Recurrent Neural Networks (RNN, GRU, LSTM) and their Bidirectional versions (BiRNN, BiGRU, BiLSTM) for word & character level language modelling in Theano
MIT License
43 stars 24 forks source link

number of LSTM blocks and cells #1

Open xy0806 opened 8 years ago

xy0806 commented 8 years ago

Dear Yaseen, thanks for your clean code. As you know, there have the conceptions 'LSTM block' and 'LSTM cell'. But in a lot of LSTM example codes, including yours, there seems to be no attention was paid to this difference. In the codes, only cells are created, while no blocks. After reading and thinking about this problem, I got the conclusion that: the LSTM with m blocks with n cells and the LSTM with one block with m*n cells are actually the same. Then, how do you think about this problem and could you give me any hints about this issue?

Thanks, Xin Yang

uyaseen commented 8 years ago

Hi Yang,

Glad to know that you found the code helpful.

The distinction between cell and blocks eroded over time, most of modern LSTM architecuters have one cell per block, (which in my opinion is simple), regarding your question, "why not much attention was paid to this difference", I am afraid I might not have a very clear answer, I would say:

-> Do we have any empirical evidence which suggests that LSTMs with multiple cells architecture works better than "one cell per block" architecture ? [I am not aware of any such evidence, If we don't have any such evidence then people will prefer less cumbersome model]; Also same applies for peep-hole connections, some people don't use them as they don't find them very helpful -> Why not increase the size of the memory (capacity) of a cell instead of adding more cells in a block ? (I would prefer increasing the memory, it's simple & more interpretable ; and to me it looks like they are equivalent as well, since both architectures are using the same gates ["cells within a block use the same gates"]) -> Simple is always better (GRUs are simple version of LSTMs and have almost equivalent performance in many task, therefore, are very popular these days)

I hope above explanation helps a bit, [1] explains the difference between various LSTM architectures.

[1] LSTM: A Search Space Odyssey

xy0806 commented 8 years ago

Dear Yaseen,

Thanks for the quick and informative reply. I think I may need to ask one more key question which is closely related to my thoughts and really confuses me now: if I want to implement a LSTM in which each block contains multiple cells, how should I modify your code? could you teach me something about the creation step of multi-cell blocks?

Thanks, Xin Yang

On 7 May 2016 at 03:49, Usama Yaseen notifications@github.com wrote:

Hi Yang,

Glad to know that you found the code helpful.

The distinction between cell and blocks eroded over time, most of modern LSTM architecuters have one cell per block, (which in my opinion is simple), regarding your question, "why not much attention was paid to this difference", I am afraid I might not have a very clear answer, I would say:

-> Do we have any empirical evidence which suggests that LSTMs with multiple cells architecture works better than "one cell per block" architecture ? [I am not aware of any such evidence, If we don't have any such evidence then people will prefer less cumbersome model]; Also same applies for peep-hole connections, some people don't use them as they don't find them very helpful -> Why not increase the size of the memory (capacity) of a cell instead of adding more cells in a block ? (I would prefer increasing the memory, it's simple & more interpretable ; and to me it looks like they are equivalent as well, since both architectures are using the same gates ["cells within a block use the same gates"]) -> Simple is always better (GRUs are simple version of LSTMs and have almost equivalent performance in many task, therefore, are very popular these days)

I hope above explanation helps a bit, [1] explains the difference between various LSTM architectures.

[1] LSTM: A Search Space Odyssey http://arxiv.org/pdf/1503.04069v1.pdf

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/uyaseen/theano-recurrence/issues/1#issuecomment-217541659

uyaseen commented 8 years ago

I have to look at few papers again to make sure I don't miss anything, but these days I am travelling and don't even have access to my laptop, you have to wait at-least one week for the reply (I am sorry it cannot be earlier than that :/)

xy0806 commented 8 years ago

ok, i can wait for that. i can play with the most simple one those days. ^_^

best On 7 May 2016 18:12, "Usama Yaseen" notifications@github.com wrote:

I have to look at few papers again to make sure I don't miss anything, but these days I am travelling and don't even have access to my laptop, you have to wait at-least one week for the reply (I am sorry it cannot be earlier than that :/)

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/uyaseen/theano-recurrence/issues/1#issuecomment-217626750

son20112074 commented 8 years ago

Thank Yaseen and Xin Yang. This is also my problem.And now, i can more understand.

DongGuangchang commented 7 years ago

Dear Yaseen, I ,a rookie in depth learning, encountered some difficulties when debugging your program on recurrent neural networks: first, it has a error when debugging sample.py . ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again. data size: 49388, vocab size: 75 train(..) load_data(..) [Train] # of rows: 987 ... transferring data to the GPU Traceback (most recent call last): ... building the model File "F:/DL-File/RNN/theano-recurrence-b9b8a82410be005d5a3121345e8d62c5ca547982/train.py", line 145, in n_h=100, use_existing_model=True, n_epochs=600) File "F:/DL-File/RNN/theano-recurrence-b9b8a82410be005d5a3121345e8d62c5ca547982/train.py", line 50, in train rec_params = pkl.load(f) EOFError


Second, it has a error when debugging train.py . ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 2) has dtype int32, while the result of the inner function (fn) has dtype int64. This can happen if the inner function of scan results in an upcast or downcast. I hope you help explain the reasons for the above mistakes。Thank you very much! Thanks, Liang Dong