Order of training the captioning module v/s the proposal module (and whether training is E2E?)

amanchadha commented 4 years ago

Hi Vladimir,

First, thanks for the great codebase - everything is neatly organized in the source files - a nice deviation compared to what AI codebases from papers usually look like :)

Some questions:

Train and Predict Run the training and prediction script. It will, first, train the captioning model and, then, evaluate the predictions of the best model in the learned proposal setting.

From your comments on training, it is clear that the captioning module is trained first (on GT proposals?). However, it is not very clear when the proposal module is trained. Is the training end-to-end as in Zhuo et al. [59] where both modules are trained in unison (where the captioning module is able to influence the event proposal mechanism)? Can you explain this sequence clearly (maybe for the sake for everyone by updating the readme?). Thanks!

v-iashin commented 4 years ago

Hi, Thanks for the positive words and feedback.

Regarding your questions:

The training is not "end-to-end"
We train only the captioning module (yes, on GT proposals)
We use the predictions (./data/bafcg_val_100_proposal_result.csv – we contacted the authors for it) from the model proposed in Wang et. al 2018 which wasn't finetuned anyhow

Therefore, the sequence of training:

Training the captioning module on GT proposals (it will evaluate the model on GT automatically)
Evaluate the captioning module with the proposals from Wang et. al 2018

amanchadha commented 4 years ago

Thank you - you've been very helpful!

v-iashin / MDVC

Order of training the captioning module v/s the proposal module (and whether training is E2E?) #2