I am currently doing the NLP Specialization on Coursera from deeplearning.ai which introduced me to Trax library .
The documentation provided there says a GRU layer in Trax can only accept a number of hidden state units equal to the number of elements in the embeddings(Embedding size) on the input words .
Having studied the vanilla RNNs and traditional implementations of GRU , my understanding seems to be that the number of GRU units in a layer must equal the input dimension , i.e, the number of words/characters in a given input sentence at each time step so as to propagate information from the first word upto the last .
My question in case of Trax's GRU layer is , how does setting the number of units equal to the Embedding size take care of the Recurrence part of the input in the Scan() layer ? How is the recurrence handled if the number of hidden units is only equal to the number of elements in the embeddings and not the input dimension(number of words at each time step) ?
I am currently studying Trax and have trouble comprehending this part . Could someone please help ?
The recurrence in RNNs is over the sentence length, so it can be arbitrary with the same weights. I'm happy to explain more in other places, like our chat, let's reserve this for bugs in code.
I am currently doing the NLP Specialization on Coursera from deeplearning.ai which introduced me to Trax library . The documentation provided there says a GRU layer in Trax can only accept a number of hidden state units equal to the number of elements in the embeddings(Embedding size) on the input words .
Having studied the vanilla RNNs and traditional implementations of GRU , my understanding seems to be that the number of GRU units in a layer must equal the input dimension , i.e, the number of words/characters in a given input sentence at each time step so as to propagate information from the first word upto the last .
My question in case of Trax's GRU layer is , how does setting the number of units equal to the Embedding size take care of the Recurrence part of the input in the Scan() layer ? How is the recurrence handled if the number of hidden units is only equal to the number of elements in the embeddings and not the input dimension(number of words at each time step) ?
I am currently studying Trax and have trouble comprehending this part . Could someone please help ?