google / trax

Trax — Deep Learning with Clear Code and Speed
Apache License 2.0
8.04k stars 814 forks source link

[Documentation] Hidden units in GRU Layer #1054

Closed Bharathi-A-7 closed 3 years ago

Bharathi-A-7 commented 3 years ago

I am currently doing the NLP Specialization on Coursera from deeplearning.ai which introduced me to Trax library . The documentation provided there says a GRU layer in Trax can only accept a number of hidden state units equal to the number of elements in the embeddings(Embedding size) on the input words .

Having studied the vanilla RNNs and traditional implementations of GRU , my understanding seems to be that the number of GRU units in a layer must equal the input dimension , i.e, the number of words/characters in a given input sentence at each time step so as to propagate information from the first word upto the last .

My question in case of Trax's GRU layer is , how does setting the number of units equal to the Embedding size take care of the Recurrence part of the input in the Scan() layer ? How is the recurrence handled if the number of hidden units is only equal to the number of elements in the embeddings and not the input dimension(number of words at each time step) ?

I am currently studying Trax and have trouble comprehending this part . Could someone please help ?

lukaszkaiser commented 3 years ago

The recurrence in RNNs is over the sentence length, so it can be arbitrary with the same weights. I'm happy to explain more in other places, like our chat, let's reserve this for bugs in code.