Question about X_{train, dev} shape and content

Hi @talhanai,

The input tenor to the LSTM model is of shape [Nexamples, Ntimesteps, Nfeatures]. Nfeatures is the feature dimension (audio=279/text=100), but how do I make sense of Nexamples and Ntimesteps? I am guessing that they have to do with the parameters timestep and stride mentioned in the paper (Audio was 30 timesteps, with stride 1. Text was 7 timesteps, and stride 3). I would appreciate it if you could elaborate on how you used (timestep, stride) parameters to reshape your feature set to the LSTM input tensor.

Initially, I thought Nexamples referred to the number of responses, e.g. 8050 (from Section 4.1.2, 4.1.3), and thus, the number of responses for audio and the number of responses for text should be the same. But then this part in Section 4.3.2 confused me, The audio and text inputs for each LSTM branch had different strides and timesteps yielding a different number of training (and development) examples, therefore we needed to equalize the number of examples (Audio was 30 timesteps, with stride 1. Text was 7 timesteps, and stride 3). This step was performed by padding the number of training examples in the smaller set (text) to match that larger set (audio) by mapping examples together that appeared in the same window of the interview..

Thanks in advance!

talhanai / redbud-tree-depression

Question about X_{train, dev} shape and content #10