Open jonatasgrosman opened 4 years ago
I believe they should be easy to implement. We don't have them by default yet - a PR adding them would be welcome!
Hey @lukaszkaiser mind if I take a stab at this?
if nobody is working currently, I'll submit PR. (@narayanacharya6 )
I haven't started yet, so go for it @zvikinoza
I've just made a PR for the issue.
I wasn't sure where to place it, so I just added it to trax.layers.combinators
:-)
If it should be in trax.layers.rnn
, I can move it.
In the PR I use copy.deepcopy
to create a backward_layer as a copy of the forward_layer. Is it appropriate way to copy layer instances?
I also have a question about RNN implementation in the Trax. Why do we initialize the hidden state of GRU and LSTM layers proportionally to the dimension of their inputs? Shouldn't we pass n_units
to MakeZeroState
and get (batch_size, n_units)
as a dimension of their hidden states?
Is this issue still relevant?
Is there a way to train a bidirectional RNN (like LSTM or GRU) on trax nowadays?