Closed youthHan closed 6 months ago
It seems the results come from the validation split, right? Could you provide more details about your implementation? Did you change any hyper-parameters or turn off some subtasks? And how do you choose checkpoints for testing?
Yes. The results are from validation. I use the scripts provided for training and on-the-fly testing. It adopts the best model to test on validation.
As for the code, the only modifications are to adapt the LMDB to work on my cluster I move the creation of self.sim to each rank during their data fetching.
It looks a bit weird. I have tried this script twice and the results are relatively stable in my trials (As shown in the following figures). Would you please share your training log file with me?
My two trials:
Detailed log of first trial: 2024-03-29 19:32:14,996 INFO **Start logging** 2024-03-29 19:32:14,996 INFO CUDA_VISIBLE_DEVICES=ALL 2024-03-29 19:32:14,996 INFO data_dir /home/tiger/VLN 2024-03-29 19:32:14,996 INFO cfg_file configs/multi.yaml 2024-03-29 19:32:14,996 INFO pretrained_model_name_or_path /home/tiger/VLN/models/Vicuna-7B 2024-03-29 19:32:14,996 INFO off_batch_task False 2024-03-29 19:32:14,996 INFO debug False 2024-03-29 19:32:14,996 INFO seed 0 2024-03-29 19:32:14,996 INFO num_epochs 30 2024-03-29 19:32:14,996 INFO resume_from_checkpoint None 2024-03-29 19:32:14,996 INFO from_scratch False 2024-03-29 19:32:14,996 INFO batch_size 1 2024-03-29 19:32:14,996 INFO val_batch_size 2 2024-03-29 19:32:14,996 INFO lr 3e-05 2024-03-29 19:32:14,996 INFO feat_dropout 0.4 2024-03-29 19:32:14,996 INFO num_warmup_steps 0 2024-03-29 19:32:14,996 INFO num_steps_per_epoch 2000 2024-03-29 19:32:14,996 INFO gradient_accumulation_step 8 2024-03-29 19:32:14,996 INFO precision amp_bf16 2024-03-29 19:32:14,996 INFO workers 0 2024-03-29 19:32:14,996 INFO world_size 8 2024-03-29 19:32:14,996 INFO local_rank 0 2024-03-29 19:32:14,996 INFO dist_url env:// 2024-03-29 19:32:14,996 INFO dist_backend nccl 2024-03-29 19:32:14,996 INFO horovod False 2024-03-29 19:32:14,997 INFO no_set_device_rank False 2024-03-29 19:32:14,997 INFO output_dir output/multi_wo_pretrain 2024-03-29 19:32:14,997 INFO max_saved_checkpoints 1 2024-03-29 19:32:14,997 INFO save_ckpt_per_epochs 10 2024-03-29 19:32:14,997 INFO save_latest_states False 2024-03-29 19:32:14,997 INFO save_pred_results False 2024-03-29 19:32:14,997 INFO save_detail_results False 2024-03-29 19:32:14,997 INFO mode train 2024-03-29 19:32:14,997 INFO stage multi 2024-03-29 19:32:14,997 INFO ignoreid -100 2024-03-29 19:32:14,997 INFO enable_og True 2024-03-29 19:32:14,997 INFO enable_summarize True 2024-03-29 19:32:14,997 INFO enable_fgr2r True 2024-03-29 19:32:14,997 INFO gen_loss_coef 1.0 2024-03-29 19:32:14,997 INFO obj_loss_coef 1.0 2024-03-29 19:32:14,997 INFO teacher_forcing_coef 1.0 2024-03-29 19:32:14,997 INFO fuse_obj False 2024-03-29 19:32:14,997 INFO multi_endpoints 1 2024-03-29 19:32:14,997 INFO path_type trusted_path 2024-03-29 19:32:14,997 INFO test_datasets ['CVDN', 'SOON', 'R2R', 'REVERIE', 'ScanQA'] 2024-03-29 19:32:14,997 INFO validation_split val_unseen 2024-03-29 19:32:14,997 INFO do_sample False 2024-03-29 19:32:14,997 INFO temperature 1.0 2024-03-29 19:32:14,997 INFO max_datapoints None 2024-03-29 19:32:14,997 INFO rank 0 2024-03-29 19:32:14,997 INFO distributed True 2024-03-29 19:32:14,997 INFO device cuda:0 2024-03-29 19:32:14,997 INFO image_feat_size 1024 2024-03-29 19:32:14,997 INFO obj_feat_size 768 2024-03-29 19:32:14,997 INFO angle_feat_size 4 2024-03-29 19:32:14,997 INFO enc_full_graph True 2024-03-29 19:32:14,997 INFO expert_policy spl 2024-03-29 19:32:14,997 INFO num_pano_layers 2 2024-03-29 19:32:14,997 INFO ----------- Feature ----------- 2024-03-29 19:32:14,997 INFO cfg.Feature.object_feature_type: 2024-03-29 19:32:14,997 INFO cfg.Feature.angle_feat_size: 4 2024-03-29 19:32:14,997 INFO cfg.Feature.max_objects: 70 2024-03-29 19:32:14,997 INFO cfg.Feature.image_feat_size: 1024 2024-03-29 19:32:14,998 INFO ----------- feature_database ----------- 2024-03-29 19:32:14,998 INFO cfg.Feature.feature_database.mp3d: eva_features/mp3d_EVA02-CLIP-L-14-336.hdf5 2024-03-29 19:32:14,998 INFO cfg.Feature.feature_database.scan_qa: eva_features/scanqa_EVA02-CLIP-L-14-336.hdf5 2024-03-29 19:32:14,998 INFO cfg.Feature.feature_database.coco: eva_features/coco_EVA02-CLIP-L-14-336.hdf5 2024-03-29 19:32:14,998 INFO cfg.Feature.obj_feat_size: 768 2024-03-29 19:32:14,998 INFO ----------- object_database ----------- 2024-03-29 19:32:14,998 INFO cfg.Feature.object_database.reverie: obj_features/reverie_obj_feat 2024-03-29 19:32:14,998 INFO cfg.Feature.object_database.soon: obj_features/soon_obj_feat 2024-03-29 19:32:14,998 INFO ----------- Dataset ----------- 2024-03-29 19:32:14,998 INFO ----------- R2R ----------- 2024-03-29 19:32:14,998 INFO cfg.Dataset.R2R.DIR: R2R 2024-03-29 19:32:14,998 INFO ----------- SPLIT ----------- 2024-03-29 19:32:14,998 INFO cfg.Dataset.R2R.SPLIT.train: FGR2R_train.json 2024-03-29 19:32:14,998 INFO cfg.Dataset.R2R.SPLIT.val_seen: R2R_val_seen_enc.json 2024-03-29 19:32:14,998 INFO cfg.Dataset.R2R.SPLIT.val_unseen: R2R_val_unseen_enc.json 2024-03-29 19:32:14,998 INFO cfg.Dataset.R2R.SPLIT.test: R2R_test_enc.json 2024-03-29 19:32:14,998 INFO ----------- REVERIE ----------- 2024-03-29 19:32:14,998 INFO cfg.Dataset.REVERIE.DIR: REVERIE 2024-03-29 19:32:14,998 INFO cfg.Dataset.REVERIE.bbox_file: BBoxes.json 2024-03-29 19:32:14,998 INFO ----------- SPLIT ----------- 2024-03-29 19:32:14,998 INFO cfg.Dataset.REVERIE.SPLIT.train: REVERIE_train_enc.json 2024-03-29 19:32:14,998 INFO cfg.Dataset.REVERIE.SPLIT.val_seen: REVERIE_val_seen_enc.json 2024-03-29 19:32:14,998 INFO cfg.Dataset.REVERIE.SPLIT.val_unseen: REVERIE_val_unseen_enc.json 2024-03-29 19:32:14,998 INFO cfg.Dataset.REVERIE.SPLIT.test: REVERIE_test_enc.json 2024-03-29 19:32:14,998 INFO ----------- CVDN ----------- 2024-03-29 19:32:14,998 INFO cfg.Dataset.CVDN.DIR: CVDN 2024-03-29 19:32:14,998 INFO ----------- SPLIT ----------- 2024-03-29 19:32:14,998 INFO cfg.Dataset.CVDN.SPLIT.train: train.json 2024-03-29 19:32:14,998 INFO cfg.Dataset.CVDN.SPLIT.val_seen: val_seen.json 2024-03-29 19:32:14,998 INFO cfg.Dataset.CVDN.SPLIT.val_unseen: val_unseen.json 2024-03-29 19:32:14,998 INFO cfg.Dataset.CVDN.SPLIT.test: test_cleaned.json 2024-03-29 19:32:14,998 INFO ----------- SOON ----------- 2024-03-29 19:32:14,998 INFO cfg.Dataset.SOON.DIR: SOON 2024-03-29 19:32:14,998 INFO ----------- SPLIT ----------- 2024-03-29 19:32:14,998 INFO cfg.Dataset.SOON.SPLIT.train: train_enc_pseudo_obj_ade30k_label.jsonl 2024-03-29 19:32:14,998 INFO cfg.Dataset.SOON.SPLIT.val_seen: val_unseen_instrs_enc_pseudo_obj_ade30k_label.jsonl 2024-03-29 19:32:14,998 INFO cfg.Dataset.SOON.SPLIT.val_unseen: val_unseen_house_enc_pseudo_obj_ade30k_label.jsonl 2024-03-29 19:32:14,998 INFO cfg.Dataset.SOON.SPLIT.test: test_v2_enc.jsonl 2024-03-29 19:32:14,998 INFO ----------- ScanQA ----------- 2024-03-29 19:32:14,998 INFO cfg.Dataset.ScanQA.DIR: ScanQA 2024-03-29 19:32:14,998 INFO ----------- SPLIT ----------- 2024-03-29 19:32:14,999 INFO cfg.Dataset.ScanQA.SPLIT.train: ScanQA_v1.0_train_reformat.json 2024-03-29 19:32:14,999 INFO cfg.Dataset.ScanQA.SPLIT.val_unseen: ScanQA_v1.0_val_reformat.json 2024-03-29 19:32:14,999 INFO cfg.Dataset.ScanQA.SPLIT.test_wo_obj: ScanQA_v1.0_test_wo_obj_reformat.json 2024-03-29 19:32:14,999 INFO cfg.Dataset.ScanQA.SPLIT.test_w_obj: ScanQA_v1.0_test_w_obj_reformat.json 2024-03-29 19:32:14,999 INFO ----------- EQA ----------- 2024-03-29 19:32:14,999 INFO cfg.Dataset.EQA.DIR: EQA_MP3D 2024-03-29 19:32:14,999 INFO ----------- SPLIT ----------- 2024-03-29 19:32:14,999 INFO cfg.Dataset.EQA.SPLIT.val_unseen: eqa_val_enc.json 2024-03-29 19:32:14,999 INFO cfg.Dataset.EQA.ANSWER_VOCAB: eqa_answer_vocab.json 2024-03-29 19:32:14,999 INFO ----------- R2R_AUG ----------- 2024-03-29 19:32:14,999 INFO cfg.Dataset.R2R_AUG.DIR: R2R 2024-03-29 19:32:14,999 INFO ----------- SPLIT ----------- 2024-03-29 19:32:14,999 INFO cfg.Dataset.R2R_AUG.SPLIT.train: R2R_prevalent_aug_train_enc.jsonl 2024-03-29 19:32:14,999 INFO ----------- REVERIE_AUG ----------- 2024-03-29 19:32:14,999 INFO cfg.Dataset.REVERIE_AUG.DIR: REVERIE 2024-03-29 19:32:14,999 INFO cfg.Dataset.REVERIE_AUG.bbox_file: BBoxes.json 2024-03-29 19:32:14,999 INFO ----------- SPLIT ----------- 2024-03-29 19:32:14,999 INFO cfg.Dataset.REVERIE_AUG.SPLIT.train: REVERIE_speaker_aug_enc.jsonl 2024-03-29 19:32:14,999 INFO ----------- LLaVA ----------- 2024-03-29 19:32:14,999 INFO cfg.Dataset.LLaVA.DIR: LLaVA 2024-03-29 19:32:14,999 INFO ----------- SPLIT ----------- 2024-03-29 19:32:14,999 INFO cfg.Dataset.LLaVA.SPLIT.train: detail_23k.json 2024-03-29 19:32:14,999 INFO ----------- Pretrain ----------- 2024-03-29 19:32:14,999 INFO cfg.Pretrain.SOURCE: ['R2R_AUG', 'REVERIE_AUG', 'R2R', 'REVERIE', 'SOON', 'CVDN', 'ScanQA'] 2024-03-29 19:32:14,999 INFO cfg.Pretrain.Ratio: [20, 2, 1, 1, 1, 1, 1] 2024-03-29 19:32:14,999 INFO ----------- LOSS_COEF ----------- 2024-03-29 19:32:14,999 INFO cfg.Pretrain.LOSS_COEF.R2R_AUG: 1 2024-03-29 19:32:14,999 INFO cfg.Pretrain.LOSS_COEF.REVERIE_AUG: 1 2024-03-29 19:32:14,999 INFO ----------- Multi ----------- 2024-03-29 19:32:14,999 INFO cfg.Multi.SOURCE: ['R2R', 'REVERIE', 'CVDN', 'SOON', 'ScanQA', 'LLaVA'] 2024-03-29 19:32:14,999 INFO cfg.Multi.Ratio: [20, 5, 1, 5, 5, 5] 2024-03-29 19:32:14,999 INFO ----------- LOSS_COEF ----------- 2024-03-29 19:32:14,999 INFO ----------- Model ----------- 2024-03-29 19:32:14,999 INFO cfg.Model.num_l_layers: 9 2024-03-29 19:32:14,999 INFO cfg.Model.num_pano_layers: 2 2024-03-29 19:32:14,999 INFO cfg.Model.num_x_layers: 4 2024-03-29 19:32:14,999 INFO cfg.Model.graph_sprels: True 2024-03-29 19:32:14,999 INFO cfg.Model.fusion: dynamic 2024-03-29 19:32:14,999 INFO cfg.Model.enc_full_graph: True 2024-03-29 19:32:15,000 INFO cfg.Model.expert_policy: spl 2024-03-29 19:32:15,000 INFO ----------- Optim ----------- 2024-03-29 19:32:15,000 INFO ----------- val_max_action_len ----------- 2024-03-29 19:32:15,000 INFO cfg.Optim.val_max_action_len.R2R: 15 2024-03-29 19:32:15,000 INFO cfg.Optim.val_max_action_len.REVERIE: 15 2024-03-29 19:32:15,000 INFO cfg.Optim.val_max_action_len.CVDN: 30 2024-03-29 19:32:15,000 INFO cfg.Optim.val_max_action_len.SOON: 20 2024-03-29 19:32:15,000 INFO cfg.Optim.val_max_action_len.EQA: 15 2024-03-29 19:32:15,000 INFO ----------- train_max_action_len ----------- 2024-03-29 19:32:15,000 INFO cfg.Optim.train_max_action_len.R2R: 15 2024-03-29 19:32:15,000 INFO cfg.Optim.train_max_action_len.REVERIE: 15 2024-03-29 19:32:15,000 INFO cfg.Optim.train_max_action_len.CVDN: 15 2024-03-29 19:32:15,000 INFO cfg.Optim.train_max_action_len.SOON: 15 2024-03-29 19:32:15,000 INFO cfg.Optim.train_max_action_len.EQA: 15 2024-03-29 19:32:15,000 INFO cfg.Optim.train_max_action_len.R2R_AUG: 15 2024-03-29 19:32:15,000 INFO cfg.Optim.train_max_action_len.REVERIE_AUG: 15 2024-03-29 19:32:25,842 INFO [INFO] R2RDataset loaded with 14039 instructions, using splits: train 2024-03-29 19:32:25,842 INFO
2024-03-29 22:14:24,994 INFO validate val_unseen split on CVDN task
2024-03-29 22:16:45,885 INFO eval 912 predictions
2024-03-29 22:16:45,964 INFO validate val_unseen split on SOON task
2024-03-29 22:21:41,445 INFO eval 3392 predictions
2024-03-29 22:21:41,552 INFO validate val_unseen split on R2R task
2024-03-29 22:23:40,166 INFO eval 2352 predictions
2024-03-29 22:23:40,461 INFO validate val_unseen split on REVERIE task
2024-03-29 22:26:44,143 INFO eval 3528 predictions
2024-03-29 22:26:44,273 INFO validate val_unseen split on ScanQA task
2024-03-29 22:28:54,640 INFO
[Eval] val_unseen epoch 0
[Eval] dataset=[CVDN] , lengths: 87.69, nav_error: 17.68, oracle_sr: 44.08 [Eval] ||| sr: 6.25, spl: 1.99, oracle path_success_rate: 80.48, dist_to_end_reduction: 1.78 [Eval] dataset=[SOON] , action_steps: 11.41, steps: 16.67, lengths: 34.29, nav_error: 13.44, oracle_error: 8.80 [Eval] ||| sr: 6.63, oracle_sr: 14.03, spl: 4.32, det_sr: 0.15, det_spl: 0.11 [Eval] dataset=[R2R] , action_steps: 7.36, steps: 10.02, lengths: 20.78, nav_error: 9.29, oracle_error: 4.79 [Eval] ||| sr: 19.47, oracle_sr: 40.26, spl: 14.11 [Eval] dataset=[REVERIE] , action_steps: 6.86, steps: 9.01, lengths: 18.31, nav_error: 10.25, oracle_error: 5.40 [Eval] ||| sr: 5.24, oracle_sr: 14.26, spl: 3.98, rgs: 0.82, rgspl: 0.71 [Eval] dataset=[ScanQA] , bleu-1: 32.58, bleu-2: 19.55, bleu-3: 13.17, bleu-4: 8.35, rouge: 32.37, cider: 60.33, meteor: 12.73, exact_match: 18.03 2024-03-29 22:28:54,647 INFO Current Score: 0.5061338690938055 2024-03-29 22:28:54,647 INFO Best Score: 0.5061338690938055 2024-03-29 23:41:20,958 INFO train [1] epoch 2024-03-29 23:41:20,963 INFO Loss: 15.15 Instr_pred: 1.63 R2R: 19.32 REVERIE: 17.95 CVDN: 17.15 SOON: 21.33 ScanQA: 1.47 LLaVA: 1.48
2024-03-29 23:41:20,965 INFO validate val_unseen split on CVDN task
2024-03-29 23:43:04,558 INFO eval 912 predictions
2024-03-29 23:43:04,610 INFO validate val_unseen split on SOON task
2024-03-29 23:47:27,417 INFO eval 3392 predictions
2024-03-29 23:47:27,507 INFO validate val_unseen split on R2R task
2024-03-29 23:49:09,221 INFO eval 2352 predictions
2024-03-29 23:49:09,281 INFO validate val_unseen split on REVERIE task
2024-03-29 23:52:01,577 INFO eval 3528 predictions
2024-03-29 23:52:01,698 INFO validate val_unseen split on ScanQA task
2024-03-29 23:54:12,030 INFO
[Eval] val_unseen epoch 1
[Eval] dataset=[CVDN] , lengths: 33.80, nav_error: 17.96, oracle_sr: 37.94 [Eval] ||| sr: 6.14, spl: 4.07, oracle path_success_rate: 69.41, dist_to_end_reduction: 1.57 [Eval] dataset=[SOON] , action_steps: 10.25, steps: 11.26, lengths: 20.79, nav_error: 12.68, oracle_error: 7.83 [Eval] ||| sr: 10.11, oracle_sr: 18.13, spl: 8.77, det_sr: 0.29, det_spl: 0.24 [Eval] dataset=[R2R] , action_steps: 6.44, steps: 7.34, lengths: 14.24, nav_error: 8.97, oracle_error: 4.50 [Eval] ||| sr: 24.91, oracle_sr: 43.07, spl: 21.57 [Eval] dataset=[REVERIE] , action_steps: 6.60, steps: 7.42, lengths: 14.52, nav_error: 10.62, oracle_error: 5.09 [Eval] ||| sr: 5.44, oracle_sr: 12.90, spl: 4.71, rgs: 1.53, rgspl: 1.29 [Eval] dataset=[ScanQA] , bleu-1: 30.83, bleu-2: 17.39, bleu-3: 12.21, bleu-4: 8.45, rouge: 32.12, cider: 58.20, meteor: 12.36, exact_match: 18.42 2024-03-29 23:54:12,037 INFO Current Score: 0.8180526293345514 2024-03-29 23:54:12,037 INFO Best Score: 0.8180526293345514 2024-03-29 23:54:12,924 INFO Remove Checkpoint at Epoch 0... 2024-03-30 01:06:17,466 INFO train [2] epoch 2024-03-30 01:06:17,471 INFO Loss: 14.45 Instr_pred: 1.58 R2R: 18.20 REVERIE: 17.89 CVDN: 14.19 SOON: 21.49 ScanQA: 1.41 LLaVA: 1.45
2024-03-30 01:06:17,476 INFO validate val_unseen split on CVDN task
2024-03-30 01:08:45,543 INFO eval 912 predictions
2024-03-30 01:08:45,659 INFO validate val_unseen split on SOON task
2024-03-30 01:14:03,992 INFO eval 3392 predictions
2024-03-30 01:14:04,092 INFO validate val_unseen split on R2R task
2024-03-30 01:16:20,116 INFO eval 2352 predictions
2024-03-30 01:16:20,177 INFO validate val_unseen split on REVERIE task
2024-03-30 01:19:38,830 INFO eval 3528 predictions
2024-03-30 01:19:38,956 INFO validate val_unseen split on ScanQA task
2024-03-30 01:21:46,126 INFO
[Eval] val_unseen epoch 2
[Eval] dataset=[CVDN] , lengths: 57.11, nav_error: 17.82, oracle_sr: 49.89 [Eval] ||| sr: 6.47, spl: 2.51, oracle path_success_rate: 78.29, dist_to_end_reduction: 1.57 [Eval] dataset=[SOON] , action_steps: 12.97, steps: 16.00, lengths: 28.81, nav_error: 13.37, oracle_error: 6.81 [Eval] ||| sr: 10.97, oracle_sr: 26.42, spl: 8.99, det_sr: 0.27, det_spl: 0.20 [Eval] dataset=[R2R] , action_steps: 8.59, steps: 11.29, lengths: 21.47, nav_error: 9.71, oracle_error: 3.77 [Eval] ||| sr: 21.60, oracle_sr: 53.10, spl: 16.72 [Eval] dataset=[REVERIE] , action_steps: 7.84, steps: 10.10, lengths: 19.68, nav_error: 10.69, oracle_error: 4.56 [Eval] ||| sr: 7.28, oracle_sr: 20.07, spl: 5.98, rgs: 1.50, rgspl: 1.22 [Eval] dataset=[ScanQA] , bleu-1: 29.09, bleu-2: 17.97, bleu-3: 13.36, bleu-4: 9.18, rouge: 31.92, cider: 59.50, meteor: 12.22, exact_match: 19.64 2024-03-30 01:21:46,132 INFO Current Score: 0.7799466357240358 2024-03-30 01:21:46,132 INFO Best Score: 0.8180526293345514 2024-03-30 02:33:57,583 INFO train [3] epoch 2024-03-30 02:33:57,588 INFO Loss: 13.50 Instr_pred: 1.50 R2R: 17.22 REVERIE: 15.93 CVDN: 14.43 SOON: 19.24 ScanQA: 1.34 LLaVA: 1.45
2024-03-30 02:33:57,594 INFO validate val_unseen split on CVDN task
2024-03-30 02:36:16,959 INFO eval 912 predictions
2024-03-30 02:36:17,023 INFO validate val_unseen split on SOON task
2024-03-30 02:41:36,695 INFO eval 3392 predictions
2024-03-30 02:41:36,802 INFO validate val_unseen split on R2R task
2024-03-30 02:43:33,279 INFO eval 2352 predictions
2024-03-30 02:43:33,339 INFO validate val_unseen split on REVERIE task
2024-03-30 02:46:42,237 INFO eval 3528 predictions
2024-03-30 02:46:42,370 INFO validate val_unseen split on ScanQA task
2024-03-30 02:48:53,602 INFO
[Eval] val_unseen epoch 3
[Eval] dataset=[CVDN] , lengths: 51.31, nav_error: 18.19, oracle_sr: 46.82 [Eval] ||| sr: 6.25, spl: 2.84, oracle path_success_rate: 75.11, dist_to_end_reduction: 1.26 [Eval] dataset=[SOON] , action_steps: 12.86, steps: 15.46, lengths: 27.94, nav_error: 12.90, oracle_error: 6.50 [Eval] ||| sr: 12.79, oracle_sr: 30.84, spl: 10.61, det_sr: 0.50, det_spl: 0.45 [Eval] dataset=[R2R] , action_steps: 7.24, steps: 9.80, lengths: 18.32, nav_error: 8.04, oracle_error: 3.70 [Eval] ||| sr: 30.40, oracle_sr: 53.23, spl: 25.64 [Eval] dataset=[REVERIE] , action_steps: 7.37, steps: 9.33, lengths: 18.07, nav_error: 9.35, oracle_error: 4.34 [Eval] ||| sr: 12.36, oracle_sr: 24.32, spl: 10.31, rgs: 2.47, rgspl: 2.00 [Eval] dataset=[ScanQA] , bleu-1: 32.88, bleu-2: 21.24, bleu-3: 15.65, bleu-4: 10.78, rouge: 34.05, cider: 65.66, meteor: 13.12, exact_match: 20.56 2024-03-30 02:48:53,608 INFO Current Score: 1.1081138416634404 2024-03-30 02:48:53,608 INFO Best Score: 1.1081138416634404 2024-03-30 02:48:54,624 INFO Remove Checkpoint at Epoch 1... 2024-03-30 04:00:31,778 INFO train [4] epoch 2024-03-30 04:00:31,782 INFO Loss: 12.21 Instr_pred: 1.51 R2R: 15.64 REVERIE: 13.64 CVDN: 13.82 SOON: 18.33 ScanQA: 1.34 LLaVA: 1.43
2024-03-30 04:00:31,784 INFO validate val_unseen split on CVDN task
2024-03-30 04:03:18,865 INFO eval 912 predictions
2024-03-30 04:03:18,939 INFO validate val_unseen split on SOON task
2024-03-30 04:08:59,016 INFO eval 3392 predictions
2024-03-30 04:08:59,122 INFO validate val_unseen split on R2R task
2024-03-30 04:11:13,652 INFO eval 2352 predictions
2024-03-30 04:11:13,714 INFO validate val_unseen split on REVERIE task
2024-03-30 04:14:45,240 INFO eval 3528 predictions
2024-03-30 04:14:45,365 INFO validate val_unseen split on ScanQA task
2024-03-30 04:16:56,008 INFO
[Eval] val_unseen epoch 4
[Eval] dataset=[CVDN] , lengths: 69.94, nav_error: 16.49, oracle_sr: 58.33 [Eval] ||| sr: 8.33, spl: 2.62, oracle path_success_rate: 83.77, dist_to_end_reduction: 2.93 [Eval] dataset=[SOON] , action_steps: 13.80, steps: 17.46, lengths: 32.88, nav_error: 11.95, oracle_error: 5.92 [Eval] ||| sr: 13.89, oracle_sr: 33.90, spl: 11.01, det_sr: 0.35, det_spl: 0.27 [Eval] dataset=[R2R] , action_steps: 8.77, steps: 11.95, lengths: 23.84, nav_error: 7.40, oracle_error: 2.68 [Eval] ||| sr: 31.21, oracle_sr: 66.45, spl: 24.26 [Eval] dataset=[REVERIE] , action_steps: 8.37, steps: 11.13, lengths: 22.42, nav_error: 9.09, oracle_error: 3.81 [Eval] ||| sr: 14.88, oracle_sr: 33.11, spl: 11.53, rgs: 2.66, rgspl: 2.03 [Eval] dataset=[ScanQA] , bleu-1: 32.59, bleu-2: 20.01, bleu-3: 13.25, bleu-4: 8.58, rouge: 33.96, cider: 62.81, meteor: 13.11, exact_match: 19.96 2024-03-30 04:16:56,014 INFO Current Score: 1.1334468161685174 2024-03-30 04:16:56,014 INFO Best Score: 1.1334468161685174 2024-03-30 04:16:56,934 INFO Remove Checkpoint at Epoch 3... 2024-03-30 05:28:45,884 INFO train [5] epoch 2024-03-30 05:28:45,887 INFO Loss: 11.38 Instr_pred: 1.48 R2R: 13.99 REVERIE: 13.98 CVDN: 15.35 SOON: 17.06 ScanQA: 1.35 LLaVA: 1.44
2024-03-30 05:28:45,895 INFO validate val_unseen split on CVDN task
2024-03-30 05:31:21,934 INFO eval 912 predictions
2024-03-30 05:31:22,008 INFO validate val_unseen split on SOON task
2024-03-30 05:36:44,011 INFO eval 3392 predictions
2024-03-30 05:36:44,121 INFO validate val_unseen split on R2R task
2024-03-30 05:38:39,008 INFO eval 2352 predictions
2024-03-30 05:38:39,068 INFO validate val_unseen split on REVERIE task
2024-03-30 05:41:56,488 INFO eval 3528 predictions
2024-03-30 05:41:56,614 INFO validate val_unseen split on ScanQA task
2024-03-30 05:44:09,239 INFO
[Eval] val_unseen epoch 5
[Eval] dataset=[CVDN] , lengths: 67.66, nav_error: 15.94, oracle_sr: 56.14 [Eval] ||| sr: 7.24, spl: 2.76, oracle path_success_rate: 85.75, dist_to_end_reduction: 3.59 [Eval] dataset=[SOON] , action_steps: 12.86, steps: 17.04, lengths: 31.63, nav_error: 10.97, oracle_error: 5.86 [Eval] ||| sr: 19.13, oracle_sr: 35.82, spl: 15.74, det_sr: 1.15, det_spl: 0.95 [Eval] dataset=[R2R] , action_steps: 7.36, steps: 9.13, lengths: 17.64, nav_error: 6.20, oracle_error: 2.57 [Eval] ||| sr: 42.43, oracle_sr: 69.69, spl: 34.82 [Eval] dataset=[REVERIE] , action_steps: 7.79, steps: 9.96, lengths: 19.49, nav_error: 7.56, oracle_error: 3.30 [Eval] ||| sr: 23.13, oracle_sr: 40.99, spl: 18.23, rgs: 3.97, rgspl: 3.12 [Eval] dataset=[ScanQA] , bleu-1: 34.88, bleu-2: 21.84, bleu-3: 15.66, bleu-4: 10.10, rouge: 34.48, cider: 65.54, meteor: 13.81, exact_match: 18.95 2024-03-30 05:44:09,246 INFO Current Score: 1.670203482569839 2024-03-30 05:44:09,246 INFO Best Score: 1.670203482569839 2024-03-30 05:44:10,146 INFO Remove Checkpoint at Epoch 4... 2024-03-30 06:56:00,217 INFO train [6] epoch 2024-03-30 06:56:00,222 INFO Loss: 10.78 Instr_pred: 1.44 R2R: 13.07 REVERIE: 13.13 CVDN: 15.47 SOON: 15.88 ScanQA: 1.26 LLaVA: 1.39
2024-03-30 06:56:00,224 INFO validate val_unseen split on CVDN task
2024-03-30 06:58:26,996 INFO eval 912 predictions
2024-03-30 06:58:27,061 INFO validate val_unseen split on SOON task
2024-03-30 07:04:02,394 INFO eval 3392 predictions
2024-03-30 07:04:02,510 INFO validate val_unseen split on R2R task
2024-03-30 07:06:09,416 INFO eval 2352 predictions
2024-03-30 07:06:09,477 INFO validate val_unseen split on REVERIE task
2024-03-30 07:09:46,120 INFO eval 3528 predictions
2024-03-30 07:09:46,246 INFO validate val_unseen split on ScanQA task
2024-03-30 07:11:54,574 INFO
[Eval] val_unseen epoch 6
[Eval] dataset=[CVDN] , lengths: 54.48, nav_error: 15.49, oracle_sr: 55.04 [Eval] ||| sr: 7.89, spl: 4.10, oracle path_success_rate: 84.98, dist_to_end_reduction: 4.18 [Eval] dataset=[SOON] , action_steps: 13.91, steps: 16.97, lengths: 32.18, nav_error: 10.09, oracle_error: 4.91 [Eval] ||| sr: 21.29, oracle_sr: 48.41, spl: 17.03, det_sr: 0.91, det_spl: 0.74 [Eval] dataset=[R2R] , action_steps: 8.00, steps: 9.94, lengths: 19.55, nav_error: 6.00, oracle_error: 2.19 [Eval] ||| sr: 45.54, oracle_sr: 75.72, spl: 37.19 [Eval] dataset=[REVERIE] , action_steps: 8.65, steps: 11.11, lengths: 22.03, nav_error: 7.24, oracle_error: 2.79 [Eval] ||| sr: 23.07, oracle_sr: 50.17, spl: 18.37, rgs: 4.37, rgspl: 3.31 [Eval] dataset=[ScanQA] , bleu-1: 33.52, bleu-2: 22.18, bleu-3: 16.79, bleu-4: 12.11, rouge: 35.48, cider: 67.86, meteor: 13.73, exact_match: 22.03 2024-03-30 07:11:54,581 INFO Current Score: 1.7621165890141546 2024-03-30 07:11:54,581 INFO Best Score: 1.7621165890141546 2024-03-30 07:11:55,498 INFO Remove Checkpoint at Epoch 5... 2024-03-30 08:22:09,807 INFO train [7] epoch 2024-03-30 08:22:09,812 INFO Loss: 9.78 Instr_pred: 1.38 R2R: 11.89 REVERIE: 11.27 CVDN: 11.81 SOON: 16.48 ScanQA: 1.22 LLaVA: 1.42
2024-03-30 08:22:09,814 INFO validate val_unseen split on CVDN task
2024-03-30 08:24:27,507 INFO eval 912 predictions
2024-03-30 08:24:27,566 INFO validate val_unseen split on SOON task
2024-03-30 08:29:43,408 INFO eval 3392 predictions
2024-03-30 08:29:43,529 INFO validate val_unseen split on R2R task
2024-03-30 08:31:34,179 INFO eval 2352 predictions
2024-03-30 08:31:34,236 INFO validate val_unseen split on REVERIE task
2024-03-30 08:34:37,918 INFO eval 3528 predictions
2024-03-30 08:34:38,035 INFO validate val_unseen split on ScanQA task
2024-03-30 08:36:51,931 INFO
[Eval] val_unseen epoch 7
[Eval] dataset=[CVDN] , lengths: 43.69, nav_error: 15.72, oracle_sr: 54.71 [Eval] ||| sr: 9.43, spl: 6.22, oracle path_success_rate: 82.13, dist_to_end_reduction: 4.01 [Eval] dataset=[SOON] , action_steps: 12.80, steps: 15.97, lengths: 30.96, nav_error: 9.49, oracle_error: 5.14 [Eval] ||| sr: 25.94, oracle_sr: 46.40, spl: 20.34, det_sr: 0.97, det_spl: 0.81 [Eval] dataset=[R2R] , action_steps: 6.93, steps: 8.25, lengths: 16.06, nav_error: 5.02, oracle_error: 2.24 [Eval] ||| sr: 54.17, oracle_sr: 74.15, spl: 44.97 [Eval] dataset=[REVERIE] , action_steps: 7.21, steps: 8.36, lengths: 16.46, nav_error: 6.91, oracle_error: 3.14 [Eval] ||| sr: 26.87, oracle_sr: 46.71, spl: 21.52, rgs: 4.79, rgspl: 3.74 [Eval] dataset=[ScanQA] , bleu-1: 39.60, bleu-2: 25.39, bleu-3: 17.94, bleu-4: 12.04, rouge: 38.13, cider: 74.44, meteor: 15.39, exact_match: 20.94 2024-03-30 08:36:51,937 INFO Current Score: 2.102380853998675 2024-03-30 08:36:51,937 INFO Best Score: 2.102380853998675 2024-03-30 08:36:52,826 INFO Remove Checkpoint at Epoch 6... 2024-03-30 09:47:24,932 INFO train [8] epoch 2024-03-30 09:47:24,937 INFO Loss: 9.60 Instr_pred: 1.42 R2R: 11.64 REVERIE: 11.16 CVDN: 12.95 SOON: 15.18 ScanQA: 1.28 LLaVA: 1.41
2024-03-30 09:47:24,939 INFO validate val_unseen split on CVDN task
2024-03-30 09:49:31,976 INFO eval 912 predictions
2024-03-30 09:49:32,034 INFO validate val_unseen split on SOON task
2024-03-30 09:54:24,467 INFO eval 3392 predictions
2024-03-30 09:54:24,587 INFO validate val_unseen split on R2R task
2024-03-30 09:56:08,383 INFO eval 2352 predictions
2024-03-30 09:56:08,441 INFO validate val_unseen split on REVERIE task
2024-03-30 09:59:07,674 INFO eval 3528 predictions
2024-03-30 09:59:07,796 INFO validate val_unseen split on ScanQA task
2024-03-30 10:01:17,603 INFO
[Eval] val_unseen epoch 8
[Eval] dataset=[CVDN] , lengths: 42.97, nav_error: 14.52, oracle_sr: 52.19 [Eval] ||| sr: 9.65, spl: 6.12, oracle path_success_rate: 82.13, dist_to_end_reduction: 5.03 [Eval] dataset=[SOON] , action_steps: 11.66, steps: 13.84, lengths: 26.50, nav_error: 9.30, oracle_error: 5.42 [Eval] ||| sr: 27.21, oracle_sr: 40.60, spl: 22.58, det_sr: 1.47, det_spl: 1.28 [Eval] dataset=[R2R] , action_steps: 6.37, steps: 7.36, lengths: 14.34, nav_error: 4.59, oracle_error: 2.28 [Eval] ||| sr: 57.57, oracle_sr: 73.30, spl: 49.05 [Eval] dataset=[REVERIE] , action_steps: 6.92, steps: 8.39, lengths: 16.29, nav_error: 6.35, oracle_error: 3.08 [Eval] ||| sr: 31.15, oracle_sr: 45.52, spl: 25.68, rgs: 5.61, rgspl: 4.57 [Eval] dataset=[ScanQA] , bleu-1: 36.06, bleu-2: 24.08, bleu-3: 17.04, bleu-4: 12.14, rouge: 36.73, cider: 71.39, meteor: 14.48, exact_match: 21.88 2024-03-30 10:01:17,609 INFO Current Score: 2.3681853570720754 2024-03-30 10:01:17,609 INFO Best Score: 2.3681853570720754 2024-03-30 10:01:18,500 INFO Remove Checkpoint at Epoch 7... 2024-03-30 11:10:58,760 INFO train [9] epoch 2024-03-30 11:10:58,765 INFO Loss: 8.51 Instr_pred: 1.35 R2R: 10.31 REVERIE: 10.53 CVDN: 8.20 SOON: 13.53 ScanQA: 1.28 LLaVA: 1.39
2024-03-30 11:10:58,770 INFO validate val_unseen split on CVDN task
2024-03-30 11:13:02,004 INFO eval 912 predictions
2024-03-30 11:13:02,064 INFO validate val_unseen split on SOON task
2024-03-30 11:18:28,237 INFO eval 3392 predictions
2024-03-30 11:18:28,361 INFO validate val_unseen split on R2R task
2024-03-30 11:20:24,192 INFO eval 2352 predictions
2024-03-30 11:20:24,251 INFO validate val_unseen split on REVERIE task
2024-03-30 11:23:38,026 INFO eval 3528 predictions
2024-03-30 11:23:38,149 INFO validate val_unseen split on ScanQA task
2024-03-30 11:25:49,157 INFO
[Eval] val_unseen epoch 9
[Eval] dataset=[CVDN] , lengths: 43.94, nav_error: 14.00, oracle_sr: 54.50 [Eval] ||| sr: 10.20, spl: 7.23, oracle path_success_rate: 86.18, dist_to_end_reduction: 5.63 [Eval] dataset=[SOON] , action_steps: 13.29, steps: 17.12, lengths: 32.65, nav_error: 9.21, oracle_error: 4.89 [Eval] ||| sr: 28.15, oracle_sr: 46.93, spl: 22.66, det_sr: 1.68, det_spl: 1.38 [Eval] dataset=[R2R] , action_steps: 6.99, steps: 8.58, lengths: 16.85, nav_error: 5.15, oracle_error: 2.23 [Eval] ||| sr: 55.44, oracle_sr: 74.57, spl: 46.63 [Eval] dataset=[REVERIE] , action_steps: 7.51, steps: 9.10, lengths: 17.88, nav_error: 6.88, oracle_error: 2.93 [Eval] ||| sr: 29.20, oracle_sr: 44.95, spl: 24.40, rgs: 5.47, rgspl: 4.42 [Eval] dataset=[ScanQA] , bleu-1: 34.00, bleu-2: 21.88, bleu-3: 16.26, bleu-4: 11.34, rouge: 35.53, cider: 67.66, meteor: 13.69, exact_match: 21.69 2024-03-30 11:25:49,163 INFO Current Score: 2.2958890340663176 2024-03-30 11:25:49,164 INFO Best Score: 2.3681853570720754 2024-03-30 12:34:11,924 INFO train [10] epoch 2024-03-30 12:34:11,929 INFO Loss: 8.32 Instr_pred: 1.37 R2R: 10.09 REVERIE: 9.18 CVDN: 11.79 SOON: 13.88 ScanQA: 1.20 LLaVA: 1.38
2024-03-30 12:34:11,931 INFO validate val_unseen split on CVDN task
2024-03-30 12:36:02,917 INFO eval 912 predictions
2024-03-30 12:36:02,967 INFO validate val_unseen split on SOON task
2024-03-30 12:41:44,886 INFO eval 3392 predictions
2024-03-30 12:41:45,017 INFO validate val_unseen split on R2R task
2024-03-30 12:43:42,177 INFO eval 2352 predictions
2024-03-30 12:43:42,235 INFO validate val_unseen split on REVERIE task
2024-03-30 12:46:58,270 INFO eval 3528 predictions
2024-03-30 12:46:58,859 INFO validate val_unseen split on ScanQA task
2024-03-30 12:49:09,075 INFO
[Eval] val_unseen epoch 10
[Eval] dataset=[CVDN] , lengths: 35.86, nav_error: 14.60, oracle_sr: 49.45 [Eval] ||| sr: 10.96, spl: 8.06, oracle path_success_rate: 81.69, dist_to_end_reduction: 5.00 [Eval] dataset=[SOON] , action_steps: 14.25, steps: 20.35, lengths: 39.57, nav_error: 8.66, oracle_error: 4.32 [Eval] ||| sr: 28.95, oracle_sr: 52.56, spl: 21.04, det_sr: 1.65, det_spl: 1.26 [Eval] dataset=[R2R] , action_steps: 7.08, steps: 8.60, lengths: 17.06, nav_error: 4.63, oracle_error: 1.92 [Eval] ||| sr: 58.42, oracle_sr: 77.38, spl: 47.96 [Eval] dataset=[REVERIE] , action_steps: 7.45, steps: 9.73, lengths: 19.14, nav_error: 5.85, oracle_error: 2.64 [Eval] ||| sr: 37.22, oracle_sr: 53.83, spl: 30.28, rgs: 8.19, rgspl: 6.62 [Eval] dataset=[ScanQA] , bleu-1: 35.20, bleu-2: 22.36, bleu-3: 15.88, bleu-4: 10.82, rouge: 36.23, cider: 70.12, meteor: 14.15, exact_match: 21.60 2024-03-30 12:49:09,081 INFO Current Score: 2.417490289485414 2024-03-30 12:49:09,081 INFO Best Score: 2.417490289485414 2024-03-30 12:49:09,992 INFO Remove Checkpoint at Epoch 8... 2024-03-30 13:57:33,398 INFO train [11] epoch 2024-03-30 13:57:33,403 INFO Loss: 7.73 Instr_pred: 1.31 R2R: 9.25 REVERIE: 9.17 CVDN: 11.33 SOON: 12.19 ScanQA: 1.23 LLaVA: 1.39
2024-03-30 13:57:33,405 INFO validate val_unseen split on CVDN task
2024-03-30 13:59:33,332 INFO eval 912 predictions
2024-03-30 13:59:33,388 INFO validate val_unseen split on SOON task
2024-03-30 14:04:31,109 INFO eval 3392 predictions
2024-03-30 14:04:31,237 INFO validate val_unseen split on R2R task
2024-03-30 14:06:16,776 INFO eval 2352 predictions
2024-03-30 14:06:16,835 INFO validate val_unseen split on REVERIE task
2024-03-30 14:09:11,378 INFO eval 3528 predictions
2024-03-30 14:09:11,498 INFO validate val_unseen split on ScanQA task
2024-03-30 14:11:20,798 INFO
[Eval] val_unseen epoch 11
[Eval] dataset=[CVDN] , lengths: 36.61, nav_error: 15.37, oracle_sr: 48.36 [Eval] ||| sr: 10.31, spl: 7.08, oracle path_success_rate: 80.15, dist_to_end_reduction: 4.32 [Eval] dataset=[SOON] , action_steps: 11.68, steps: 14.16, lengths: 27.67, nav_error: 8.80, oracle_error: 5.16 [Eval] ||| sr: 27.86, oracle_sr: 44.10, spl: 22.54, det_sr: 2.15, det_spl: 1.80 [Eval] dataset=[R2R] , action_steps: 6.44, steps: 7.67, lengths: 15.04, nav_error: 4.81, oracle_error: 2.29 [Eval] ||| sr: 58.59, oracle_sr: 73.47, spl: 49.77 [Eval] dataset=[REVERIE] , action_steps: 6.70, steps: 7.90, lengths: 15.43, nav_error: 6.19, oracle_error: 2.94 [Eval] ||| sr: 38.32, oracle_sr: 51.25, spl: 32.49, rgs: 10.91, rgspl: 9.21 [Eval] dataset=[ScanQA] , bleu-1: 34.14, bleu-2: 21.65, bleu-3: 15.13, bleu-4: 9.23, rouge: 35.58, cider: 67.54, meteor: 13.99, exact_match: 21.67 2024-03-30 14:11:20,805 INFO Current Score: 2.564540383734105 2024-03-30 14:11:20,807 INFO Best Score: 2.564540383734105 2024-03-30 14:11:21,760 INFO Remove Checkpoint at Epoch 10... 2024-03-30 15:19:45,118 INFO train [12] epoch 2024-03-30 15:19:45,123 INFO Loss: 7.40 Instr_pred: 1.28 R2R: 8.62 REVERIE: 8.63 CVDN: 10.51 SOON: 12.50 ScanQA: 1.35 LLaVA: 1.38
2024-03-30 15:19:45,125 INFO validate val_unseen split on CVDN task
2024-03-30 15:21:56,546 INFO eval 912 predictions
2024-03-30 15:21:56,605 INFO validate val_unseen split on SOON task
2024-03-30 15:27:05,935 INFO eval 3392 predictions
2024-03-30 15:27:06,067 INFO validate val_unseen split on R2R task
2024-03-30 15:29:03,798 INFO eval 2352 predictions
2024-03-30 15:29:03,858 INFO validate val_unseen split on REVERIE task
2024-03-30 15:32:00,503 INFO eval 3528 predictions
2024-03-30 15:32:00,623 INFO validate val_unseen split on ScanQA task
2024-03-30 15:34:11,544 INFO
[Eval] val_unseen epoch 12
[Eval] dataset=[CVDN] , lengths: 43.24, nav_error: 14.90, oracle_sr: 53.07 [Eval] ||| sr: 12.06, spl: 8.36, oracle path_success_rate: 84.21, dist_to_end_reduction: 4.63 [Eval] dataset=[SOON] , action_steps: 12.59, steps: 16.27, lengths: 30.69, nav_error: 8.87, oracle_error: 4.72 [Eval] ||| sr: 31.43, oracle_sr: 49.41, spl: 24.67, det_sr: 2.18, det_spl: 1.68 [Eval] dataset=[R2R] , action_steps: 7.18, steps: 8.95, lengths: 17.55, nav_error: 4.70, oracle_error: 1.84 [Eval] ||| sr: 57.57, oracle_sr: 78.02, spl: 47.18 [Eval] dataset=[REVERIE] , action_steps: 6.92, steps: 8.50, lengths: 16.40, nav_error: 5.73, oracle_error: 2.62 [Eval] ||| sr: 37.59, oracle_sr: 54.42, spl: 31.45, rgs: 14.68, rgspl: 12.35 [Eval] dataset=[ScanQA] , bleu-1: 36.33, bleu-2: 23.55, bleu-3: 16.60, bleu-4: 11.22, rouge: 37.42, cider: 71.92, meteor: 14.71, exact_match: 22.05 2024-03-30 15:34:11,550 INFO Current Score: 2.573191832547898 2024-03-30 15:34:11,550 INFO Best Score: 2.573191832547898 2024-03-30 15:34:12,480 INFO Remove Checkpoint at Epoch 11... 2024-03-30 16:41:41,163 INFO train [13] epoch 2024-03-30 16:41:41,168 INFO Loss: 6.84 Instr_pred: 1.24 R2R: 7.83 REVERIE: 8.74 CVDN: 13.35 SOON: 11.21 ScanQA: 1.18 LLaVA: 1.37
2024-03-30 16:41:41,170 INFO validate val_unseen split on CVDN task
2024-03-30 16:44:15,595 INFO eval 912 predictions
2024-03-30 16:44:15,660 INFO validate val_unseen split on SOON task
2024-03-30 16:49:31,334 INFO eval 3392 predictions
2024-03-30 16:49:31,469 INFO validate val_unseen split on R2R task
2024-03-30 16:51:34,423 INFO eval 2352 predictions
2024-03-30 16:51:34,495 INFO validate val_unseen split on REVERIE task
2024-03-30 16:54:41,757 INFO eval 3528 predictions
2024-03-30 16:54:41,879 INFO validate val_unseen split on ScanQA task
2024-03-30 16:56:59,030 INFO
[Eval] val_unseen epoch 13
[Eval] dataset=[CVDN] , lengths: 60.77, nav_error: 14.49, oracle_sr: 60.09 [Eval] ||| sr: 11.40, spl: 6.27, oracle path_success_rate: 87.06, dist_to_end_reduction: 5.01 [Eval] dataset=[SOON] , action_steps: 12.63, steps: 15.51, lengths: 30.09, nav_error: 8.52, oracle_error: 4.63 [Eval] ||| sr: 30.69, oracle_sr: 49.68, spl: 23.88, det_sr: 2.03, det_spl: 1.71 [Eval] dataset=[R2R] , action_steps: 7.33, steps: 8.69, lengths: 17.26, nav_error: 4.96, oracle_error: 1.92 [Eval] ||| sr: 55.82, oracle_sr: 78.02, spl: 45.34 [Eval] dataset=[REVERIE] , action_steps: 7.32, steps: 8.50, lengths: 16.72, nav_error: 6.68, oracle_error: 2.93 [Eval] ||| sr: 32.17, oracle_sr: 49.32, spl: 27.05, rgs: 14.71, rgspl: 12.41 [Eval] dataset=[ScanQA] , bleu-1: 34.09, bleu-2: 21.39, bleu-3: 15.35, bleu-4: 9.88, rouge: 36.09, cider: 69.09, meteor: 13.96, exact_match: 22.07 2024-03-30 16:56:59,037 INFO Current Score: 2.392523098466528 2024-03-30 16:56:59,037 INFO Best Score: 2.573191832547898 2024-03-30 18:05:40,942 INFO train [14] epoch 2024-03-30 18:05:40,947 INFO Loss: 6.39 Instr_pred: 1.19 R2R: 7.45 REVERIE: 6.61 CVDN: 11.05 SOON: 11.46 ScanQA: 1.14 LLaVA: 1.38
2024-03-30 18:05:40,953 INFO validate val_unseen split on CVDN task
2024-03-30 18:08:05,191 INFO eval 912 predictions
2024-03-30 18:08:05,257 INFO validate val_unseen split on SOON task
2024-03-30 18:13:28,990 INFO eval 3392 predictions
2024-03-30 18:13:29,168 INFO validate val_unseen split on R2R task
2024-03-30 18:15:24,642 INFO eval 2352 predictions
2024-03-30 18:15:24,702 INFO validate val_unseen split on REVERIE task
2024-03-30 18:18:36,252 INFO eval 3528 predictions
2024-03-30 18:18:36,372 INFO validate val_unseen split on ScanQA task
2024-03-30 18:20:51,629 INFO
[Eval] val_unseen epoch 14
[Eval] dataset=[CVDN] , lengths: 50.99, nav_error: 13.85, oracle_sr: 58.99 [Eval] ||| sr: 12.17, spl: 7.16, oracle path_success_rate: 86.51, dist_to_end_reduction: 5.71 [Eval] dataset=[SOON] , action_steps: 13.13, steps: 17.16, lengths: 33.33, nav_error: 8.37, oracle_error: 4.50 [Eval] ||| sr: 31.25, oracle_sr: 49.91, spl: 23.36, det_sr: 3.01, det_spl: 2.46 [Eval] dataset=[R2R] , action_steps: 7.06, steps: 8.50, lengths: 16.72, nav_error: 4.57, oracle_error: 1.96 [Eval] ||| sr: 59.27, oracle_sr: 77.25, spl: 48.33 [Eval] dataset=[REVERIE] , action_steps: 7.37, steps: 9.23, lengths: 18.00, nav_error: 5.85, oracle_error: 2.66 [Eval] ||| sr: 39.17, oracle_sr: 52.61, spl: 31.76, rgs: 18.48, rgspl: 15.08 [Eval] dataset=[ScanQA] , bleu-1: 34.10, bleu-2: 22.38, bleu-3: 15.95, bleu-4: 9.89, rouge: 36.05, cider: 69.88, meteor: 13.96, exact_match: 22.14 2024-03-30 18:20:51,635 INFO Current Score: 2.551344273997394 2024-03-30 18:20:51,635 INFO Best Score: 2.573191832547898 2024-03-30 19:29:54,030 INFO train [15] epoch 2024-03-30 19:29:54,035 INFO Loss: 6.06 Instr_pred: 1.20 R2R: 6.96 REVERIE: 7.25 CVDN: 7.29 SOON: 10.30 ScanQA: 1.16 LLaVA: 1.33
2024-03-30 19:29:54,037 INFO validate val_unseen split on CVDN task
2024-03-30 19:31:56,298 INFO eval 912 predictions
2024-03-30 19:31:56,353 INFO validate val_unseen split on SOON task
2024-03-30 19:36:30,459 INFO eval 3392 predictions
2024-03-30 19:36:30,588 INFO validate val_unseen split on R2R task
2024-03-30 19:38:05,687 INFO eval 2352 predictions
2024-03-30 19:38:05,745 INFO validate val_unseen split on REVERIE task
2024-03-30 19:40:45,366 INFO eval 3528 predictions
2024-03-30 19:40:45,482 INFO validate val_unseen split on ScanQA task
2024-03-30 19:43:03,096 INFO
[Eval] val_unseen epoch 15
[Eval] dataset=[CVDN] , lengths: 36.86, nav_error: 13.89, oracle_sr: 49.56 [Eval] ||| sr: 13.05, spl: 9.26, oracle path_success_rate: 79.71, dist_to_end_reduction: 5.70 [Eval] dataset=[SOON] , action_steps: 10.82, steps: 12.17, lengths: 22.64, nav_error: 8.82, oracle_error: 5.60 [Eval] ||| sr: 28.86, oracle_sr: 38.83, spl: 24.37, det_sr: 2.27, det_spl: 1.93 [Eval] dataset=[R2R] , action_steps: 5.78, steps: 6.32, lengths: 12.36, nav_error: 4.20, oracle_error: 2.45 [Eval] ||| sr: 61.90, oracle_sr: 70.75, spl: 54.44 [Eval] dataset=[REVERIE] , action_steps: 6.00, steps: 6.70, lengths: 13.16, nav_error: 6.14, oracle_error: 3.43 [Eval] ||| sr: 38.18, oracle_sr: 43.76, spl: 33.66, rgs: 18.74, rgspl: 16.40 [Eval] dataset=[ScanQA] , bleu-1: 35.44, bleu-2: 22.13, bleu-3: 15.40, bleu-4: 9.64, rouge: 36.90, cider: 70.74, meteor: 14.38, exact_match: 22.12 2024-03-30 19:43:03,103 INFO Current Score: 2.743111539802042 2024-03-30 19:43:03,103 INFO Best Score: 2.743111539802042 2024-03-30 19:43:03,114 INFO Remove Checkpoint at Epoch 12... 2024-03-30 20:50:56,869 INFO train [16] epoch 2024-03-30 20:50:56,874 INFO Loss: 5.54 Instr_pred: 1.13 R2R: 6.18 REVERIE: 6.80 CVDN: 11.22 SOON: 9.58 ScanQA: 1.12 LLaVA: 1.36
2024-03-30 20:50:56,876 INFO validate val_unseen split on CVDN task
2024-03-30 20:53:14,769 INFO eval 912 predictions
2024-03-30 20:53:14,830 INFO validate val_unseen split on SOON task
2024-03-30 20:58:32,975 INFO eval 3392 predictions
2024-03-30 20:58:33,121 INFO validate val_unseen split on R2R task
2024-03-30 21:00:20,890 INFO eval 2352 predictions
2024-03-30 21:00:20,950 INFO validate val_unseen split on REVERIE task
2024-03-30 21:03:23,945 INFO eval 3528 predictions
2024-03-30 21:03:24,070 INFO validate val_unseen split on ScanQA task
2024-03-30 21:05:42,933 INFO
[Eval] val_unseen epoch 16
[Eval] dataset=[CVDN] , lengths: 47.12, nav_error: 14.21, oracle_sr: 57.35 [Eval] ||| sr: 15.02, spl: 9.58, oracle path_success_rate: 87.50, dist_to_end_reduction: 5.36 [Eval] dataset=[SOON] , action_steps: 12.72, steps: 16.06, lengths: 30.60, nav_error: 8.67, oracle_error: 4.80 [Eval] ||| sr: 30.78, oracle_sr: 48.50, spl: 23.69, det_sr: 2.27, det_spl: 1.78 [Eval] dataset=[R2R] , action_steps: 6.63, steps: 7.66, lengths: 15.02, nav_error: 4.20, oracle_error: 1.83 [Eval] ||| sr: 62.76, oracle_sr: 79.59, spl: 52.73 [Eval] dataset=[REVERIE] , action_steps: 7.11, steps: 8.65, lengths: 17.06, nav_error: 6.56, oracle_error: 2.88 [Eval] ||| sr: 38.49, oracle_sr: 51.93, spl: 32.47, rgs: 18.96, rgspl: 16.04 [Eval] dataset=[ScanQA] , bleu-1: 37.65, bleu-2: 24.35, bleu-3: 17.66, bleu-4: 12.17, rouge: 37.87, cider: 74.59, meteor: 15.22, exact_match: 21.30 2024-03-30 21:05:42,940 INFO Current Score: 2.6563927389774324 2024-03-30 21:05:42,940 INFO Best Score: 2.743111539802042 2024-03-30 22:12:26,215 INFO train [17] epoch 2024-03-30 22:12:26,220 INFO Loss: 5.33 Instr_pred: 1.12 R2R: 6.08 REVERIE: 5.98 CVDN: 11.03 SOON: 9.10 ScanQA: 1.13 LLaVA: 1.34
2024-03-30 22:12:26,222 INFO validate val_unseen split on CVDN task
2024-03-30 22:14:18,561 INFO eval 912 predictions
2024-03-30 22:14:18,612 INFO validate val_unseen split on SOON task
2024-03-30 22:19:32,207 INFO eval 3392 predictions
2024-03-30 22:19:32,346 INFO validate val_unseen split on R2R task
2024-03-30 22:21:19,998 INFO eval 2352 predictions
2024-03-30 22:21:20,061 INFO validate val_unseen split on REVERIE task
2024-03-30 22:24:17,612 INFO eval 3528 predictions
2024-03-30 22:24:17,730 INFO validate val_unseen split on ScanQA task
2024-03-30 22:26:36,049 INFO
[Eval] val_unseen epoch 17
[Eval] dataset=[CVDN] , lengths: 32.36, nav_error: 13.91, oracle_sr: 50.99 [Eval] ||| sr: 14.14, spl: 10.84, oracle path_success_rate: 82.35, dist_to_end_reduction: 5.72 [Eval] dataset=[SOON] , action_steps: 12.68, steps: 15.89, lengths: 30.33, nav_error: 8.09, oracle_error: 4.46 [Eval] ||| sr: 33.31, oracle_sr: 50.32, spl: 26.31, det_sr: 2.80, det_spl: 2.26 [Eval] dataset=[R2R] , action_steps: 6.57, steps: 7.54, lengths: 14.72, nav_error: 4.42, oracle_error: 2.06 [Eval] ||| sr: 60.67, oracle_sr: 76.40, spl: 51.40 [Eval] dataset=[REVERIE] , action_steps: 6.61, steps: 7.67, lengths: 14.89, nav_error: 5.94, oracle_error: 3.05 [Eval] ||| sr: 38.55, oracle_sr: 48.36, spl: 33.21, rgs: 18.54, rgspl: 16.03 [Eval] dataset=[ScanQA] , bleu-1: 36.25, bleu-2: 23.66, bleu-3: 17.59, bleu-4: 12.31, rouge: 37.51, cider: 73.62, meteor: 14.91, exact_match: 22.86 2024-03-30 22:26:36,056 INFO Current Score: 2.752998403668071 2024-03-30 22:26:36,056 INFO Best Score: 2.752998403668071 2024-03-30 22:26:37,057 INFO Remove Checkpoint at Epoch 15... 2024-03-30 23:33:28,404 INFO train [18] epoch 2024-03-30 23:33:28,409 INFO Loss: 5.21 Instr_pred: 1.06 R2R: 5.39 REVERIE: 7.01 CVDN: 10.81 SOON: 9.24 ScanQA: 1.12 LLaVA: 1.35
2024-03-30 23:33:28,411 INFO validate val_unseen split on CVDN task
2024-03-30 23:35:47,108 INFO eval 912 predictions
2024-03-30 23:35:47,168 INFO validate val_unseen split on SOON task
2024-03-30 23:40:47,527 INFO eval 3392 predictions
2024-03-30 23:40:47,665 INFO validate val_unseen split on R2R task
2024-03-30 23:42:26,089 INFO eval 2352 predictions
2024-03-30 23:42:26,148 INFO validate val_unseen split on REVERIE task
2024-03-30 23:45:15,658 INFO eval 3528 predictions
2024-03-30 23:45:15,773 INFO validate val_unseen split on ScanQA task
2024-03-30 23:47:35,142 INFO
[Eval] val_unseen epoch 18
[Eval] dataset=[CVDN] , lengths: 44.43, nav_error: 14.80, oracle_sr: 55.70 [Eval] ||| sr: 12.83, spl: 8.52, oracle path_success_rate: 85.96, dist_to_end_reduction: 4.78 [Eval] dataset=[SOON] , action_steps: 11.99, steps: 13.95, lengths: 26.44, nav_error: 8.36, oracle_error: 4.78 [Eval] ||| sr: 33.90, oracle_sr: 48.97, spl: 26.39, det_sr: 3.27, det_spl: 2.69 [Eval] dataset=[R2R] , action_steps: 6.23, steps: 6.88, lengths: 13.32, nav_error: 4.19, oracle_error: 2.07 [Eval] ||| sr: 61.90, oracle_sr: 75.55, spl: 52.70 [Eval] dataset=[REVERIE] , action_steps: 6.33, steps: 6.96, lengths: 13.31, nav_error: 5.98, oracle_error: 3.08 [Eval] ||| sr: 39.65, oracle_sr: 51.25, spl: 34.54, rgs: 20.72, rgspl: 17.95 [Eval] dataset=[ScanQA] , bleu-1: 37.09, bleu-2: 23.95, bleu-3: 17.05, bleu-4: 11.61, rouge: 37.80, cider: 74.22, meteor: 15.07, exact_match: 22.82 2024-03-30 23:47:35,149 INFO Current Score: 2.814189876757082 2024-03-30 23:47:35,149 INFO Best Score: 2.814189876757082 2024-03-30 23:47:36,177 INFO Remove Checkpoint at Epoch 17... 2024-03-31 00:52:50,422 INFO train [19] epoch 2024-03-31 00:52:50,427 INFO Loss: 4.81 Instr_pred: 1.05 R2R: 5.49 REVERIE: 5.24 CVDN: 12.88 SOON: 8.23 ScanQA: 1.14 LLaVA: 1.36
2024-03-31 00:52:50,428 INFO validate val_unseen split on CVDN task
2024-03-31 00:55:02,053 INFO eval 912 predictions
2024-03-31 00:55:02,112 INFO validate val_unseen split on SOON task
2024-03-31 01:00:01,998 INFO eval 3392 predictions
2024-03-31 01:00:02,136 INFO validate val_unseen split on R2R task
2024-03-31 01:01:47,602 INFO eval 2352 predictions
2024-03-31 01:01:47,662 INFO validate val_unseen split on REVERIE task
2024-03-31 01:04:44,181 INFO eval 3528 predictions
2024-03-31 01:04:44,296 INFO validate val_unseen split on ScanQA task
2024-03-31 01:07:02,835 INFO
[Eval] val_unseen epoch 19
[Eval] dataset=[CVDN] , lengths: 45.17, nav_error: 13.55, oracle_sr: 57.13 [Eval] ||| sr: 17.54, spl: 12.25, oracle path_success_rate: 84.98, dist_to_end_reduction: 5.90 [Eval] dataset=[SOON] , action_steps: 11.92, steps: 14.87, lengths: 28.82, nav_error: 8.34, oracle_error: 4.94 [Eval] ||| sr: 29.75, oracle_sr: 46.37, spl: 22.86, det_sr: 3.10, det_spl: 2.47 [Eval] dataset=[R2R] , action_steps: 6.61, steps: 7.72, lengths: 15.30, nav_error: 4.29, oracle_error: 1.92 [Eval] ||| sr: 60.80, oracle_sr: 76.79, spl: 50.81 [Eval] dataset=[REVERIE] , action_steps: 6.76, steps: 7.93, lengths: 15.30, nav_error: 6.05, oracle_error: 2.94 [Eval] ||| sr: 37.33, oracle_sr: 50.45, spl: 31.04, rgs: 19.53, rgspl: 16.02 [Eval] dataset=[ScanQA] , bleu-1: 35.05, bleu-2: 23.00, bleu-3: 16.92, bleu-4: 11.36, rouge: 36.19, cider: 70.73, meteor: 14.15, exact_match: 22.07 2024-03-31 01:07:02,841 INFO Current Score: 2.5539858823966566 2024-03-31 01:07:02,841 INFO Best Score: 2.814189876757082 2024-03-31 02:12:15,109 INFO train [20] epoch 2024-03-31 02:12:15,112 INFO Loss: 4.55 Instr_pred: 1.00 R2R: 4.78 REVERIE: 6.07 CVDN: 9.94 SOON: 7.83 ScanQA: 1.13 LLaVA: 1.36
2024-03-31 02:12:15,114 INFO validate val_unseen split on CVDN task
2024-03-31 02:14:09,124 INFO eval 912 predictions
2024-03-31 02:14:09,176 INFO validate val_unseen split on SOON task
2024-03-31 02:18:54,788 INFO eval 3392 predictions
2024-03-31 02:18:54,915 INFO validate val_unseen split on R2R task
2024-03-31 02:20:47,041 INFO eval 2352 predictions
2024-03-31 02:20:47,101 INFO validate val_unseen split on REVERIE task
2024-03-31 02:23:46,691 INFO eval 3528 predictions
2024-03-31 02:23:46,820 INFO validate val_unseen split on ScanQA task
2024-03-31 02:26:06,010 INFO
[Eval] val_unseen epoch 20
[Eval] dataset=[CVDN] , lengths: 34.59, nav_error: 14.79, oracle_sr: 48.79 [Eval] ||| sr: 14.69, spl: 11.13, oracle path_success_rate: 81.47, dist_to_end_reduction: 4.85 [Eval] dataset=[SOON] , action_steps: 11.19, steps: 12.95, lengths: 24.90, nav_error: 8.48, oracle_error: 5.36 [Eval] ||| sr: 31.43, oracle_sr: 42.48, spl: 25.15, det_sr: 2.77, det_spl: 2.28 [Eval] dataset=[R2R] , action_steps: 6.76, steps: 7.94, lengths: 15.71, nav_error: 4.33, oracle_error: 1.88 [Eval] ||| sr: 61.56, oracle_sr: 77.85, spl: 51.76 [Eval] dataset=[REVERIE] , action_steps: 6.92, steps: 8.17, lengths: 15.89, nav_error: 5.93, oracle_error: 2.86 [Eval] ||| sr: 39.26, oracle_sr: 54.31, spl: 32.71, rgs: 20.75, rgspl: 17.20 [Eval] dataset=[ScanQA] , bleu-1: 35.84, bleu-2: 23.46, bleu-3: 17.46, bleu-4: 12.23, rouge: 36.43, cider: 72.62, meteor: 14.57, exact_match: 21.82 2024-03-31 02:26:06,016 INFO Current Score: 2.701974743562689 2024-03-31 02:26:06,016 INFO Best Score: 2.814189876757082 2024-03-31 03:32:28,511 INFO train [21] epoch 2024-03-31 03:32:28,516 INFO Loss: 4.56 Instr_pred: 0.99 R2R: 4.91 REVERIE: 4.62 CVDN: 11.92 SOON: 8.01 ScanQA: 1.13 LLaVA: 1.33
2024-03-31 03:32:28,521 INFO validate val_unseen split on CVDN task
2024-03-31 03:34:58,785 INFO eval 912 predictions
2024-03-31 03:34:58,889 INFO validate val_unseen split on SOON task
2024-03-31 03:40:19,122 INFO eval 3392 predictions
2024-03-31 03:40:19,257 INFO validate val_unseen split on R2R task
2024-03-31 03:42:09,664 INFO eval 2352 predictions
2024-03-31 03:42:09,724 INFO validate val_unseen split on REVERIE task
2024-03-31 03:45:21,157 INFO eval 3528 predictions
2024-03-31 03:45:21,279 INFO validate val_unseen split on ScanQA task
2024-03-31 03:47:37,326 INFO
[Eval] val_unseen epoch 21
[Eval] dataset=[CVDN] , lengths: 52.49, nav_error: 13.99, oracle_sr: 59.76 [Eval] ||| sr: 14.69, spl: 9.19, oracle path_success_rate: 86.07, dist_to_end_reduction: 5.54 [Eval] dataset=[SOON] , action_steps: 13.25, steps: 16.71, lengths: 33.26, nav_error: 8.02, oracle_error: 4.28 [Eval] ||| sr: 33.49, oracle_sr: 53.57, spl: 24.32, det_sr: 2.83, det_spl: 2.29 [Eval] dataset=[R2R] , action_steps: 6.76, steps: 7.95, lengths: 15.54, nav_error: 4.15, oracle_error: 1.80 [Eval] ||| sr: 61.99, oracle_sr: 79.17, spl: 51.73 [Eval] dataset=[REVERIE] , action_steps: 7.35, steps: 8.73, lengths: 16.82, nav_error: 6.17, oracle_error: 2.68 [Eval] ||| sr: 35.15, oracle_sr: 54.34, spl: 28.94, rgs: 18.99, rgspl: 15.40 [Eval] dataset=[ScanQA] , bleu-1: 35.04, bleu-2: 22.81, bleu-3: 16.78, bleu-4: 10.76, rouge: 36.55, cider: 70.68, meteor: 14.24, exact_match: 22.01 2024-03-31 03:47:37,332 INFO Current Score: 2.5669870890756346 2024-03-31 03:47:37,333 INFO Best Score: 2.814189876757082 2024-03-31 04:51:45,894 INFO train [22] epoch 2024-03-31 04:51:45,899 INFO Loss: 4.03 Instr_pred: 0.95 R2R: 4.49 REVERIE: 4.87 CVDN: 8.09 SOON: 6.75 ScanQA: 1.04 LLaVA: 1.36
2024-03-31 04:51:45,901 INFO validate val_unseen split on CVDN task
2024-03-31 04:53:37,937 INFO eval 912 predictions
2024-03-31 04:53:37,988 INFO validate val_unseen split on SOON task
2024-03-31 04:58:33,299 INFO eval 3392 predictions
2024-03-31 04:58:33,436 INFO validate val_unseen split on R2R task
2024-03-31 05:00:13,253 INFO eval 2352 predictions
2024-03-31 05:00:13,312 INFO validate val_unseen split on REVERIE task
2024-03-31 05:03:07,999 INFO eval 3528 predictions
2024-03-31 05:03:08,118 INFO validate val_unseen split on ScanQA task
2024-03-31 05:05:22,789 INFO
[Eval] val_unseen epoch 22
[Eval] dataset=[CVDN] , lengths: 31.59, nav_error: 13.44, oracle_sr: 48.57 [Eval] ||| sr: 15.46, spl: 12.12, oracle path_success_rate: 81.03, dist_to_end_reduction: 6.13 [Eval] dataset=[SOON] , action_steps: 11.77, steps: 14.27, lengths: 27.56, nav_error: 8.36, oracle_error: 5.06 [Eval] ||| sr: 31.93, oracle_sr: 45.52, spl: 24.14, det_sr: 2.39, det_spl: 1.86 [Eval] dataset=[R2R] , action_steps: 6.07, steps: 6.98, lengths: 13.70, nav_error: 3.84, oracle_error: 2.02 [Eval] ||| sr: 63.73, oracle_sr: 75.17, spl: 54.26 [Eval] dataset=[REVERIE] , action_steps: 6.68, steps: 8.26, lengths: 15.95, nav_error: 5.84, oracle_error: 2.94 [Eval] ||| sr: 39.46, oracle_sr: 50.17, spl: 33.39, rgs: 20.58, rgspl: 17.19 [Eval] dataset=[ScanQA] , bleu-1: 35.33, bleu-2: 22.98, bleu-3: 16.77, bleu-4: 11.02, rouge: 37.19, cider: 71.79, meteor: 14.44, exact_match: 22.88 2024-03-31 05:05:22,795 INFO Current Score: 2.724060579792985 2024-03-31 05:05:22,796 INFO Best Score: 2.814189876757082 2024-03-31 06:10:33,497 INFO train [23] epoch 2024-03-31 06:10:33,502 INFO Loss: 4.05 Instr_pred: 0.93 R2R: 4.31 REVERIE: 5.04 CVDN: 9.01 SOON: 6.68 ScanQA: 1.04 LLaVA: 1.34
2024-03-31 06:10:33,505 INFO validate val_unseen split on CVDN task
2024-03-31 06:12:43,753 INFO eval 912 predictions
2024-03-31 06:12:43,814 INFO validate val_unseen split on SOON task
2024-03-31 06:17:45,533 INFO eval 3392 predictions
2024-03-31 06:17:45,668 INFO validate val_unseen split on R2R task
2024-03-31 06:19:38,182 INFO eval 2352 predictions
2024-03-31 06:19:38,242 INFO validate val_unseen split on REVERIE task
2024-03-31 06:22:42,856 INFO eval 3528 predictions
2024-03-31 06:22:42,975 INFO validate val_unseen split on ScanQA task
2024-03-31 06:24:59,675 INFO
[Eval] val_unseen epoch 23
[Eval] dataset=[CVDN] , lengths: 45.59, nav_error: 13.10, oracle_sr: 55.37 [Eval] ||| sr: 16.01, spl: 10.66, oracle path_success_rate: 86.95, dist_to_end_reduction: 6.54 [Eval] dataset=[SOON] , action_steps: 12.13, steps: 15.22, lengths: 29.87, nav_error: 8.59, oracle_error: 4.87 [Eval] ||| sr: 30.60, oracle_sr: 46.76, spl: 23.19, det_sr: 3.24, det_spl: 2.57 [Eval] dataset=[R2R] , action_steps: 6.92, steps: 8.23, lengths: 16.23, nav_error: 4.14, oracle_error: 1.66 [Eval] ||| sr: 62.76, oracle_sr: 80.36, spl: 51.49 [Eval] dataset=[REVERIE] , action_steps: 7.13, steps: 8.54, lengths: 16.65, nav_error: 5.82, oracle_error: 2.63 [Eval] ||| sr: 40.28, oracle_sr: 56.01, spl: 33.16, rgs: 20.44, rgspl: 16.71 [Eval] dataset=[ScanQA] , bleu-1: 36.76, bleu-2: 23.99, bleu-3: 17.09, bleu-4: 11.81, rouge: 37.88, cider: 74.28, meteor: 14.89, exact_match: 22.86 2024-03-31 06:24:59,681 INFO Current Score: 2.635875656976935 2024-03-31 06:24:59,682 INFO Best Score: 2.814189876757082 2024-03-31 07:29:52,313 INFO train [24] epoch 2024-03-31 07:29:52,318 INFO Loss: 4.18 Instr_pred: 0.93 R2R: 4.08 REVERIE: 4.99 CVDN: 11.42 SOON: 8.55 ScanQA: 1.13 LLaVA: 1.35
2024-03-31 07:29:52,323 INFO validate val_unseen split on CVDN task
2024-03-31 07:31:44,436 INFO eval 912 predictions
2024-03-31 07:31:44,921 INFO validate val_unseen split on SOON task
2024-03-31 07:36:23,048 INFO eval 3392 predictions
2024-03-31 07:36:23,179 INFO validate val_unseen split on R2R task
2024-03-31 07:38:03,963 INFO eval 2352 predictions
2024-03-31 07:38:04,024 INFO validate val_unseen split on REVERIE task
2024-03-31 07:40:59,372 INFO eval 3528 predictions
2024-03-31 07:40:59,488 INFO validate val_unseen split on ScanQA task
2024-03-31 07:43:18,233 INFO
[Eval] val_unseen epoch 24
[Eval] dataset=[CVDN] , lengths: 34.29, nav_error: 13.09, oracle_sr: 51.64 [Eval] ||| sr: 18.31, spl: 13.69, oracle path_success_rate: 82.68, dist_to_end_reduction: 6.58 [Eval] dataset=[SOON] , action_steps: 11.10, steps: 13.33, lengths: 25.51, nav_error: 8.46, oracle_error: 5.13 [Eval] ||| sr: 30.25, oracle_sr: 44.52, spl: 23.72, det_sr: 2.83, det_spl: 2.30 [Eval] dataset=[R2R] , action_steps: 6.24, steps: 7.12, lengths: 14.00, nav_error: 4.11, oracle_error: 1.98 [Eval] ||| sr: 63.18, oracle_sr: 75.98, spl: 53.59 [Eval] dataset=[REVERIE] , action_steps: 6.59, steps: 7.82, lengths: 15.19, nav_error: 6.26, oracle_error: 3.20 [Eval] ||| sr: 37.13, oracle_sr: 48.33, spl: 30.66, rgs: 19.70, rgspl: 16.03 [Eval] dataset=[ScanQA] , bleu-1: 36.81, bleu-2: 23.98, bleu-3: 17.38, bleu-4: 11.67, rouge: 38.00, cider: 73.55, meteor: 14.92, exact_match: 22.95 2024-03-31 07:43:18,240 INFO Current Score: 2.6223856958669014 2024-03-31 07:43:18,240 INFO Best Score: 2.814189876757082 2024-03-31 08:48:13,912 INFO train [25] epoch 2024-03-31 08:48:13,916 INFO Loss: 3.66 Instr_pred: 0.87 R2R: 3.60 REVERIE: 4.53 CVDN: 8.02 SOON: 6.66 ScanQA: 1.12 LLaVA: 1.31
2024-03-31 08:48:13,919 INFO validate val_unseen split on CVDN task
2024-03-31 08:50:00,514 INFO eval 912 predictions
2024-03-31 08:50:00,572 INFO validate val_unseen split on SOON task
2024-03-31 08:54:33,941 INFO eval 3392 predictions
2024-03-31 08:54:34,076 INFO validate val_unseen split on R2R task
2024-03-31 08:56:08,237 INFO eval 2352 predictions
2024-03-31 08:56:08,306 INFO validate val_unseen split on REVERIE task
2024-03-31 08:58:59,513 INFO eval 3528 predictions
2024-03-31 08:58:59,622 INFO validate val_unseen split on ScanQA task
2024-03-31 09:01:16,565 INFO
[Eval] val_unseen epoch 25
[Eval] dataset=[CVDN] , lengths: 31.15, nav_error: 14.23, oracle_sr: 45.72 [Eval] ||| sr: 14.58, spl: 11.74, oracle path_success_rate: 81.58, dist_to_end_reduction: 5.25 [Eval] dataset=[SOON] , action_steps: 10.80, steps: 12.58, lengths: 24.02, nav_error: 8.41, oracle_error: 5.39 [Eval] ||| sr: 32.25, oracle_sr: 42.81, spl: 25.95, det_sr: 2.30, det_spl: 1.85 [Eval] dataset=[R2R] , action_steps: 5.70, steps: 6.28, lengths: 12.29, nav_error: 4.00, oracle_error: 2.28 [Eval] ||| sr: 63.73, oracle_sr: 71.81, spl: 55.93 [Eval] dataset=[REVERIE] , action_steps: 6.10, steps: 6.97, lengths: 13.48, nav_error: 5.75, oracle_error: 3.17 [Eval] ||| sr: 39.40, oracle_sr: 44.64, spl: 34.10, rgs: 20.72, rgspl: 17.75 [Eval] dataset=[ScanQA] , bleu-1: 37.14, bleu-2: 24.12, bleu-3: 17.58, bleu-4: 12.36, rouge: 38.12, cider: 73.84, meteor: 15.01, exact_match: 22.71 2024-03-31 09:01:16,571 INFO Current Score: 2.8392911746916174 2024-03-31 09:01:16,571 INFO Best Score: 2.8392911746916174 2024-03-31 09:01:17,576 INFO Remove Checkpoint at Epoch 18... 2024-03-31 10:05:51,755 INFO train [26] epoch 2024-03-31 10:05:51,759 INFO Loss: 3.41 Instr_pred: 0.87 R2R: 3.65 REVERIE: 4.19 CVDN: 6.23 SOON: 5.58 ScanQA: 1.08 LLaVA: 1.31
2024-03-31 10:05:51,761 INFO validate val_unseen split on CVDN task
2024-03-31 10:07:51,783 INFO eval 912 predictions
2024-03-31 10:07:51,838 INFO validate val_unseen split on SOON task
2024-03-31 10:13:10,272 INFO eval 3392 predictions
2024-03-31 10:13:10,413 INFO validate val_unseen split on R2R task
2024-03-31 10:15:03,117 INFO eval 2352 predictions
2024-03-31 10:15:03,215 INFO validate val_unseen split on REVERIE task
2024-03-31 10:18:13,141 INFO eval 3528 predictions
2024-03-31 10:18:13,264 INFO validate val_unseen split on ScanQA task
2024-03-31 10:20:31,426 INFO
[Eval] val_unseen epoch 26
[Eval] dataset=[CVDN] , lengths: 36.38, nav_error: 14.02, oracle_sr: 49.56 [Eval] ||| sr: 15.79, spl: 12.03, oracle path_success_rate: 82.24, dist_to_end_reduction: 5.54 [Eval] dataset=[SOON] , action_steps: 12.85, steps: 16.15, lengths: 31.59, nav_error: 8.08, oracle_error: 4.29 [Eval] ||| sr: 33.87, oracle_sr: 53.18, spl: 25.08, det_sr: 2.36, det_spl: 1.87 [Eval] dataset=[R2R] , action_steps: 6.93, steps: 8.42, lengths: 16.71, nav_error: 4.11, oracle_error: 1.75 [Eval] ||| sr: 63.14, oracle_sr: 79.80, spl: 51.97 [Eval] dataset=[REVERIE] , action_steps: 7.29, steps: 9.20, lengths: 17.93, nav_error: 5.87, oracle_error: 2.66 [Eval] ||| sr: 38.72, oracle_sr: 53.20, spl: 32.03, rgs: 20.44, rgspl: 16.68 [Eval] dataset=[ScanQA] , bleu-1: 36.29, bleu-2: 24.08, bleu-3: 17.74, bleu-4: 12.29, rouge: 37.59, cider: 73.78, meteor: 14.93, exact_match: 22.65 2024-03-31 10:20:31,433 INFO Current Score: 2.684101543413253 2024-03-31 10:20:31,433 INFO Best Score: 2.8392911746916174 2024-03-31 11:24:59,666 INFO train [27] epoch 2024-03-31 11:24:59,671 INFO Loss: 3.47 Instr_pred: 0.82 R2R: 3.42 REVERIE: 4.90 CVDN: 7.53 SOON: 6.01 ScanQA: 0.98 LLaVA: 1.33
2024-03-31 11:24:59,673 INFO validate val_unseen split on CVDN task
2024-03-31 11:27:04,928 INFO eval 912 predictions
2024-03-31 11:27:04,986 INFO validate val_unseen split on SOON task
2024-03-31 11:32:06,796 INFO eval 3392 predictions
2024-03-31 11:32:06,931 INFO validate val_unseen split on R2R task
2024-03-31 11:33:46,065 INFO eval 2352 predictions
2024-03-31 11:33:46,152 INFO validate val_unseen split on REVERIE task
2024-03-31 11:36:43,963 INFO eval 3528 predictions
2024-03-31 11:36:44,079 INFO validate val_unseen split on ScanQA task
2024-03-31 11:39:02,277 INFO
[Eval] val_unseen epoch 27
[Eval] dataset=[CVDN] , lengths: 41.00, nav_error: 13.36, oracle_sr: 53.62 [Eval] ||| sr: 15.79, spl: 11.62, oracle path_success_rate: 83.99, dist_to_end_reduction: 6.32 [Eval] dataset=[SOON] , action_steps: 11.95, steps: 15.57, lengths: 30.25, nav_error: 8.44, oracle_error: 5.06 [Eval] ||| sr: 31.52, oracle_sr: 44.69, spl: 23.50, det_sr: 2.98, det_spl: 2.26 [Eval] dataset=[R2R] , action_steps: 6.13, steps: 7.35, lengths: 14.51, nav_error: 3.83, oracle_error: 1.99 [Eval] ||| sr: 65.05, oracle_sr: 76.70, spl: 55.83 [Eval] dataset=[REVERIE] , action_steps: 6.76, steps: 8.65, lengths: 16.82, nav_error: 5.86, oracle_error: 3.03 [Eval] ||| sr: 38.89, oracle_sr: 47.76, spl: 32.05, rgs: 21.06, rgspl: 17.24 [Eval] dataset=[ScanQA] , bleu-1: 35.08, bleu-2: 22.94, bleu-3: 16.88, bleu-4: 11.62, rouge: 36.26, cider: 70.74, meteor: 14.36, exact_match: 21.41 2024-03-31 11:39:02,287 INFO Current Score: 2.68941685344448 2024-03-31 11:39:02,287 INFO Best Score: 2.8392911746916174 2024-03-31 12:43:09,270 INFO train [28] epoch 2024-03-31 12:43:09,275 INFO Loss: 3.41 Instr_pred: 0.81 R2R: 3.36 REVERIE: 4.41 CVDN: 8.58 SOON: 6.36 ScanQA: 0.94 LLaVA: 1.33
2024-03-31 12:43:09,277 INFO validate val_unseen split on CVDN task
2024-03-31 12:45:14,210 INFO eval 912 predictions
2024-03-31 12:45:14,276 INFO validate val_unseen split on SOON task
2024-03-31 12:50:22,940 INFO eval 3392 predictions
2024-03-31 12:50:23,173 INFO validate val_unseen split on R2R task
2024-03-31 12:52:15,264 INFO eval 2352 predictions
2024-03-31 12:52:15,325 INFO validate val_unseen split on REVERIE task
2024-03-31 12:55:18,495 INFO eval 3528 predictions
2024-03-31 12:55:18,620 INFO validate val_unseen split on ScanQA task
2024-03-31 12:57:37,565 INFO
[Eval] val_unseen epoch 28
[Eval] dataset=[CVDN] , lengths: 41.44, nav_error: 13.17, oracle_sr: 53.29 [Eval] ||| sr: 14.36, spl: 9.84, oracle path_success_rate: 83.77, dist_to_end_reduction: 6.47 [Eval] dataset=[SOON] , action_steps: 12.41, steps: 15.72, lengths: 30.50, nav_error: 8.37, oracle_error: 4.85 [Eval] ||| sr: 30.98, oracle_sr: 46.31, spl: 23.37, det_sr: 3.27, det_spl: 2.52 [Eval] dataset=[R2R] , action_steps: 6.87, steps: 8.20, lengths: 16.20, nav_error: 4.28, oracle_error: 1.80 [Eval] ||| sr: 61.44, oracle_sr: 78.87, spl: 50.12 [Eval] dataset=[REVERIE] , action_steps: 6.95, steps: 8.44, lengths: 16.49, nav_error: 5.85, oracle_error: 2.76 [Eval] ||| sr: 39.82, oracle_sr: 53.06, spl: 33.45, rgs: 20.61, rgspl: 17.22 [Eval] dataset=[ScanQA] , bleu-1: 36.94, bleu-2: 24.05, bleu-3: 17.55, bleu-4: 11.69, rouge: 37.11, cider: 72.64, meteor: 14.74, exact_match: 21.30 2024-03-31 12:57:37,571 INFO Current Score: 2.6276661945678432 2024-03-31 12:57:37,571 INFO Best Score: 2.8392911746916174 2024-03-31 14:01:15,761 INFO train [29] epoch 2024-03-31 14:01:15,768 INFO Loss: 3.29 Instr_pred: 0.77 R2R: 3.11 REVERIE: 4.34 CVDN: 8.85 SOON: 6.21 ScanQA: 1.01 LLaVA: 1.32
2024-03-31 14:01:15,778 INFO validate val_unseen split on CVDN task
2024-03-31 14:03:38,026 INFO eval 912 predictions
2024-03-31 14:03:38,092 INFO validate val_unseen split on SOON task
2024-03-31 14:08:57,286 INFO eval 3392 predictions
2024-03-31 14:08:57,424 INFO validate val_unseen split on R2R task
2024-03-31 14:10:43,842 INFO eval 2352 predictions
2024-03-31 14:10:43,902 INFO validate val_unseen split on REVERIE task
2024-03-31 14:13:44,676 INFO eval 3528 predictions
2024-03-31 14:13:44,796 INFO validate val_unseen split on ScanQA task
2024-03-31 14:16:02,551 INFO
[Eval] val_unseen epoch 29
[Eval] dataset=[CVDN] , lengths: 49.74, nav_error: 13.79, oracle_sr: 61.62 [Eval] ||| sr: 15.24, spl: 10.15, oracle path_success_rate: 88.49, dist_to_end_reduction: 5.81 [Eval] dataset=[SOON] , action_steps: 13.00, steps: 16.42, lengths: 32.23, nav_error: 8.42, oracle_error: 4.53 [Eval] ||| sr: 32.10, oracle_sr: 50.83, spl: 23.78, det_sr: 3.74, det_spl: 2.80 [Eval] dataset=[R2R] , action_steps: 6.62, steps: 7.74, lengths: 15.15, nav_error: 4.20, oracle_error: 1.86 [Eval] ||| sr: 61.61, oracle_sr: 78.40, spl: 51.14 [Eval] dataset=[REVERIE] , action_steps: 6.92, steps: 8.15, lengths: 15.79, nav_error: 5.66, oracle_error: 2.75 [Eval] ||| sr: 38.97, oracle_sr: 52.21, spl: 31.73, rgs: 21.57, rgspl: 17.32 [Eval] dataset=[ScanQA] , bleu-1: 35.52, bleu-2: 22.83, bleu-3: 16.46, bleu-4: 10.90, rouge: 37.10, cider: 71.67, meteor: 14.35, exact_match: 22.46 2024-03-31 14:16:02,558 INFO Current Score: 2.6130816692440284 2024-03-31 14:16:02,558 INFO Best Score: 2.8392911746916174 2024-03-31 14:16:02,558 INFO Best Results: 2024-03-31 14:16:02,558 INFO {'CVDN': {'lengths': 31.14661126346036, 'nav_error': 14.229773118503639, 'oracle_sr': 45.723684210526315, 'sr': 14.583333333333334, 'spl': 11.73502233321622, 'oracle path_success_rate': 81.57894736842105, 'dist_to_end_reduction': 5.248582608017579}, 'SOON': {'action_steps': 10.795400943396226, 'steps': 12.58372641509434, 'lengths': 24.020958730586806, 'nav_error': 8.410835817975915, 'oracle_error': 5.392800190104361, 'sr': 32.25235849056604, 'oracle_sr': 42.806603773584904, 'spl': 25.949954484650938, 'det_sr': 2.2995283018867925, 'det_spl': 1.850233824347864}, 'R2R': {'action_steps': 5.701955782312925, 'steps': 6.279761904761905, 'lengths': 12.28822895225073, 'nav_error': 3.999783149581735, 'oracle_error': 2.280396670235626, 'sr': 63.73299319727891, 'oracle_sr': 71.81122448979592, 'spl': 55.92958549224573}, 'REVERIE': {'action_steps': 6.099489795918367, 'steps': 6.972222222222222, 'lengths': 13.48172458967747, 'nav_error': 5.750300107980155, 'oracle_error': 3.1716863780798112, 'sr': 39.399092970521544, 'oracle_sr': 44.642857142857146, 'spl': 34.09649192842235, 'rgs': 20.719954648526077, 'rgspl': 17.749969346098283}, 'ScanQA': {'bleu-1': 37.13686703406328, 'bleu-2': 24.115303196274017, 'bleu-3': 17.57927591915301, 'bleu-4': 12.35650998930172, 'rouge': 38.12057941489069, 'cider': 73.83746762906446, 'meteor': 15.014816916989663, 'exact_match': 22.713675213675213}}
The losses of first few epochs are similar to your log. But the val numerical results are much better. It is quite wired. Do you think the simulator environment may be a cause?
I suspect it may be because we actually used vicuna-7b-v1.1 but mistakenly thought we were using vicuna7b-delta-v0. I will confirm it soon.
For Vicuna-7b-delta-v0, the `delta model' cannot be used directly. Users have to apply it on top of the original LLaMA weights to get actual Vicuna weights. See instructions. Have you performed this transformation before using the model? Otherwise, the obtained vicuna parameters may be abnormal.
So the model is Vicuna-v0?
The Vicuna model on my side should be ok, as I have already used it for many other models and tasks.
It takes me some time to verify. Apologize for the inconvenience.
That's fine. Thank you for getting back to me.
Do you have other ideas about the performance?
It seems that we actually use vicuna-7b-v1.1. You can replace the model with vicuna-7b-v1.1, please let me know if you still have problems. Sorry for the inconvenience.
It's actually good news. Let me have a try.
Hi, have you successfully reproduced the results?
The model is in training.
Also, I have encountered another issue: if I use vicuna-v0 tokenizer testing model on ScanQA, everything is fine except the numbers don't match; whlist I use vicuna-v1.1 tokenizer, the code reports error. I can make sure that because I have tried several times with different machines and environments (including the one you provided in requirement.txt). Testing on R2R and REVERIE is fine with both tokenizers.
The error is reported on https://github.com/zd11024/NaviLLM/blob/09f5cbeac9bf96d1a1053785d6a32df888ffeaeb/models/nav_model.py#L395
Error trace:
Traceback (most recent call last):
File "/xxx/NaviLLM/train.py", line 297, in
The generated_ids when reporting bug are: [[1, 29871, 29906, 2], [1, 2175, 2, -1]]
Debugging..
The vicuna v1.1 I use is https://huggingface.co/lmsys/vicuna-7b-v1.1 Though I think it should not be problem, I need to confirm this.
The point is the loaded tokenizer fails to decode -1, which is not in the vocabulary. Can you check whether this also an issue on your side? By the way, please to help to check the config.json in the vicuna model you use.
Now I replace the token id -1 with your pad token id.
Traceback (most recent call last):
File "train.py", line 297, in
The model is vicuna-7b-v1.1 (I've compared my parameters and the official parameters). However, I found that the tokenizer is slightly different from the current version. Here is my config.
{
"_name_or_path": "./pyllama_data/output/7B",
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 2048,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"pad_token_id": 0,
"rms_norm_eps": 1e-06,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.29.0.dev0",
"use_cache": true,
"vocab_size": 32000
}
The pad_token_id of vicuna-7b-v1.1 is -1, and the pad_token_id of vicuna-7b-delta-v1.1 is 0 (their parameters are exactly the same). Thus, -1 is automatically produced in the generation process, causing this error. To solve this problem, you can either modify the pad_token_id in the config, or manually pass it in when calling the generate function.
Hello, have you addressed the problem and reproduced the results?
I prepared vicuna-v1.1 using the delta model by FastChat. The overall val-unseen score is 2.93, lower than the released training log. Specifically on REVERIE by 2 points, on ScanQA by ~1 point. But things are much better.
It's good to hear that. Thanks for your feedback and discussion!
Hi Thanks for open-sourcing this work. I have problems reproducing the results in the paper, looking forward to your help.
I have replicated twice the multitask w/o pertaining experiments. However, I fail to reproduce the results. My results are
For reference, I have also tested your released checkpoint on R2R and REVERIE, the results are