google / trax

Trax — Deep Learning with Clear Code and Speed
Apache License 2.0
8.01k stars 813 forks source link

Bidirectional RNN #1078

Open jonatasgrosman opened 3 years ago

jonatasgrosman commented 3 years ago

Is there a way to train a bidirectional RNN (like LSTM or GRU) on trax nowadays?

lukaszkaiser commented 3 years ago

I believe they should be easy to implement. We don't have them by default yet - a PR adding them would be welcome!

narayanacharya6 commented 3 years ago

Hey @lukaszkaiser mind if I take a stab at this?

zvikinoza commented 3 years ago

if nobody is working currently, I'll submit PR. (@narayanacharya6 )

narayanacharya6 commented 3 years ago

I haven't started yet, so go for it @zvikinoza

manifest commented 3 years ago

I've just made a PR for the issue.

I wasn't sure where to place it, so I just added it to trax.layers.combinators :-) If it should be in trax.layers.rnn, I can move it.

manifest commented 3 years ago

In the PR I use copy.deepcopy to create a backward_layer as a copy of the forward_layer. Is it appropriate way to copy layer instances?

manifest commented 3 years ago

I also have a question about RNN implementation in the Trax. Why do we initialize the hidden state of GRU and LSTM layers proportionally to the dimension of their inputs? Shouldn't we pass n_units to MakeZeroState and get (batch_size, n_units) as a dimension of their hidden states?