Closed affromero closed 7 years ago
Hello. I did not try to train entire CLEVR, because CLEVR dataset is quite big and I did not have any extra time to implement CLEVR task at the time I wrote this repo. What I did is sort-of-CLEVR task written in the paper, which is much simplified (without human language processing, and simpler image) version of CLEVR. However, I expect implementation of CLEVR task is not difficult and can be achieved by adding LSTMs to the code, training end-to-end the whole system.
For second question, coord_oi and coord_oj seems to be leftover of the old codes. I will delete in anytime soon!
Regarding the training procedure for the entire CLEVR, how did you manage to train pixel and state description stages? i.e., did you train end-to-end the whole system (LSTMs, ConvInputModel, and RN)? Or was it by stages?
Another question off the topic: What is the purpose of
coord_oi
andcoord_oj
Thank you! Great implementation by the way.