xjli / r2r_vln

Room-to-Room (R2R) vision-and-language navigation
7 stars 2 forks source link

Finetuning does not work #4

Closed guhur closed 3 years ago

guhur commented 4 years ago

Hi,

Thanks for sharing your code :)

While the first set of parameters work fine with me, the second one (see below) does not seem to converge. After a few iterations, the encoder outputs nan values.

python ./tasks/R2R/train.py --feedback_method teacher --dropout_ratio 0.4 --dec_h_type vc --optm Adamax --schedule_ratio 0.2 --att_ctx_merge mean --clip_gradient_norm 0 --clip_gradient 0.1 --log_every 32 --action_space -1 --n_iters 34000 --train_score_name sr_unseen --enc_hidden_size 1024 --hidden_size 1024 --result_dir ./base/results/ --snapshot_dir ./base/snapshots/ --plot_dir ./base/plots/ --n_iters_resume N --ss_n_iters N+10000 --save_ckpt 512 --bidirectional True --encoder_type bert --top_lstm True --transformer_update True --batch_size 16 --learning_rate 5e-5

Do you know what is going on?

amoudgl commented 3 years ago

While the first set of parameters work fine with me

@guhur Were you able to reproduce val unseen SR 58 with the first command mentioned in this section?

# For example, best val_unseen spl = 0.53, best val_unseen sr = 0.58
CUDA_VISIBLE_DEVICES=0 python ./tasks/R2R/train.py --feedback_method teacher --bidirectional True --encoder_type bert --top_lstm True --transformer_update False --batch_size 20 --log_every 40 --pretrain_n_sentences 6 --pretrain_splits bi_12700_seed10-60_literal_speaker_data_aug_paths_unk --save_ckpt 10000 --ss_n_pretrain_iters 50000 --pretrain_n_iters 60000 --ss_n_iters 60000 --n_iters 70000 --dropout_ratio 0.4 --dec_h_type vc --schedule_ratio 0.4 --optm Adamax --att_ctx_merge mean --clip_gradient_norm 0 --clip_gradient 0.1 --use_pretrain --action_space -1 --pretrain_score_name sr_unseen --train_score_name sr_unseen --enc_hidden_size 1024 --hidden_size 1024 --result_dir ./base/results/ --snapshot_dir ./base/snapshots/ --plot_dir ./base/plots/
guhur commented 3 years ago

Yeah actually my issue came from the dependency requirements version on PyTorch

amoudgl commented 3 years ago

I see. I couldn't get val unseen SR 58 with the pre-training command mentioned above, hence I was curious to know if you ran into similar issues as well. Following is the val unseen SR plot for pretraining, best val unseen SR was ~45:

press_pretrain

amoudgl commented 3 years ago

@guhur Did you get similar performance curve as shown above?