Closed guhur closed 3 years ago
While the first set of parameters work fine with me
@guhur Were you able to reproduce val unseen SR 58 with the first command mentioned in this section?
# For example, best val_unseen spl = 0.53, best val_unseen sr = 0.58
CUDA_VISIBLE_DEVICES=0 python ./tasks/R2R/train.py --feedback_method teacher --bidirectional True --encoder_type bert --top_lstm True --transformer_update False --batch_size 20 --log_every 40 --pretrain_n_sentences 6 --pretrain_splits bi_12700_seed10-60_literal_speaker_data_aug_paths_unk --save_ckpt 10000 --ss_n_pretrain_iters 50000 --pretrain_n_iters 60000 --ss_n_iters 60000 --n_iters 70000 --dropout_ratio 0.4 --dec_h_type vc --schedule_ratio 0.4 --optm Adamax --att_ctx_merge mean --clip_gradient_norm 0 --clip_gradient 0.1 --use_pretrain --action_space -1 --pretrain_score_name sr_unseen --train_score_name sr_unseen --enc_hidden_size 1024 --hidden_size 1024 --result_dir ./base/results/ --snapshot_dir ./base/snapshots/ --plot_dir ./base/plots/
Yeah actually my issue came from the dependency requirements version on PyTorch
I see. I couldn't get val unseen SR 58 with the pre-training command mentioned above, hence I was curious to know if you ran into similar issues as well. Following is the val unseen SR plot for pretraining, best val unseen SR was ~45:
@guhur Did you get similar performance curve as shown above?
Hi,
Thanks for sharing your code :)
While the first set of parameters work fine with me, the second one (see below) does not seem to converge. After a few iterations, the encoder outputs
nan
values.Do you know what is going on?