Feed entire input to encoder??

I have this doubt as well. I notice that in the paper the training inputs are segmented into small chunks with each chunk feeding into the encoder to the the feature representation z_t which will then feed into g_ar(GRU). The context C_t from the gar are then used to predict feature representation z{t+k} from the future time frame. I don't know if I have the correct understanding of the paper or not.

In this implementation, I think the entire signal are fed into the encoder, the produced feature representation are split into two, the first part fed to the g_ar (GRU), then the g_ar learns to predict the second part of the representation features.

I believe that these two are different models and concepts which would bring different results. I really hope that the author could elaborate on point.

Thanks!

jefflai108 / Contrastive-Predictive-Coding-PyTorch

Feed entire input to encoder?? #17