Question about data dependent initialization

sukun1045 commented 3 years ago

Hi, thanks for sharing the nice repo. I have a question about the codebook initialization. If you re-initialize the codebook only one-time using training-batch encoder output at the beginning of the training stage, will you get a very low commit loss after that? Will it affect the generalization of testing data? In my case, I want to borrow the data-dependent re-initialization technique and apply it to my own project. I realize that it is improving code usage and prevents codebook collapse but fails to generalize on testing data (having a high reconstruction and commit loss on testing data. Is there any insight about it? Thank you!

wilson1yan commented 3 years ago

In general, I haven't observed any significant gap between train and test performance on the different video datasets I've ran. You probably don't need the data-dependent initialization and it would probably learn fine. The primary mechanism that improves code usage / prevents codebook collapse is probably the part of the code that keeps track of average code usage, and replaces unused codes with a random encoder output, found here. I haven't seen much of an effect on a train / test gap in performance either.

sukun1045 commented 3 years ago

I see. Thanks a lot!

sukun1045 commented 3 years ago

Just a follow-up comment. I realize that the issue comes from the self.init_embeddings. I don't know why the Flag self._need_init fails to be set as False, therefore during training it keeps using the encoder output to re-initialize the embeddings. Anyway, thanks again for your help.

wilson1yan / VideoGPT

Question about data dependent initialization #18