Difficulty replicating the results

harshraj172 commented 1 year ago

Hey, thanks for your great work..

There are a few clarifications I need as I am facing a bit difficulty in replicating the results, it would be very kind if you can help:

In the implementation section of the paper you have said that you took the mixin ratio as 3 for MTM task and 1 for all others, but in the pretrain_r2r.json you have taken different mixin ratios. Can you pls specify the right mixin ratio?
As you have mentioned here that model checkpoint for r2r is taken at 440000 steps. Can you pls share log file. I am just curious to know more about the loss and score of the 9 tasks used for r2r and would be glad if you share the same for other downstream tasks.

Thank You

jialuli-luka commented 1 year ago

Hi, Thanks for pointing it out! I just updated the pretrain and finetune config, and also uploaded the pretrain logs and finetune logs here: https://www.dropbox.com/sh/k8bn3zz5jkq4nt6/AABYXkl1vRrgWOr8X5gsb6W0a?dl=0

harshraj172 commented 1 year ago

Hello, glad for your kind response. I noticed that in the training_args.json file you provided, you referenced a r2r_model_clip_config.json configuration file, which seems to differ from the r2r_model_config.json file available in your GitHub repository.

In the r2r_model_clip_config.json file, there are additional parameters such as:

clip_image_resolution: 224
clip_vision_patch_size: 16
clip_vision_width: 768
clip_vision_layers: 12
clip_vision_heads: 12
clip_embed_dim: 512

However, I couldn't find these parameters in the code you provided in your repository. Could you kindly share the r2r_model_clip_config.json configuration file and also share the code changes you made to incorporate these parameters?

I would greatly appreciate your help. Thank you.

jialuli-luka commented 1 year ago

Hi,

These parameters are only specified but not used in pre-training, which should not influence the pre-training performance.

Could you share the pre-training and fine-tuning logs?

harshraj172 commented 1 year ago

Sure, it is added here: https://www.dropbox.com/scl/fo/y3coyfbccyg9q993953k6/h?rlkey=yopp23vslpe7p6q0gnal7aecw&dl=0 Due to compute issues we have trained the model with checkpoints at 30000 steps.

jialuli-luka commented 1 year ago

As what you've shared here, maybe it's related to learning rate scheduling during training? Training with 30k steps and restarting the training w/o loading the optimizer/changing the scheduling accordingly will influence the performance.

Besides, empirically, I didn't pick the model checkpoints based on these proxy task scores, but mainly focused on evaluating models' fine-tuning performance on navigation tasks (e.g., R2R).

jialuli-luka / VLN-SIG

Difficulty replicating the results #8