mertyg / vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
MIT License
222 stars 14 forks source link

train negCLIP result problem #27

Closed haoshuai714 closed 1 year ago

haoshuai714 commented 1 year ago

Hello!Thanks you grate job! i have a question about the train negclip, when I training the negclip have a bad result, such as : Eval Epoch: 50 image_to_text_mean_rank: 339.7555 image_to_text_median_rank: 79.0000 image_to_text_R@1: 0.0289 image_to_text_R@5: 0.1091 image_to_text_R@10: 0.1680 text_to_image_mean_rank: 319.8398 text_to_image_median_rank: 74.0000 text_to_image_R@1: 0.0313 text_to_image_R@5: 0.1122 text_to_image_R@10: 0.1710 val_loss: 5.2243 epoch: 50.0000 num_samples: 9678.0000 Could you give me some guide? How to reproduce the related results of negclip in the paper?

vinid commented 1 year ago

Hello! Could you give us some additional details?

Which command are you running and woth which hyper-parameters?

vinid commented 1 year ago

Closing this for now!

sigrid414 commented 1 year ago

Hello, thank you for your excellent work. I used the following parameters and could not reproduce the results in the paper. Could you please tell me whether there is a problem with my parameter setting? Params: 2023-06-29,15:44:20 | INFO | batch_size: 32 2023-06-29,15:44:20 | INFO | beta1: 0.9 2023-06-29,15:44:20 | INFO | beta2: 0.98 2023-06-29,15:44:20 | INFO | checkpoint_path: ./logs/2023_06_29-15_44_18-model_ViT-B-32-lr_1e-06-b_32-j_6-p_amp\checkpoints 2023-06-29,15:44:20 | INFO | copy_codebase: False 2023-06-29,15:44:20 | INFO | csv_caption_key: title 2023-06-29,15:44:20 | INFO | csv_hard_captions_key: neg_caption 2023-06-29,15:44:20 | INFO | csv_img_key: filepath 2023-06-29,15:44:20 | INFO | csv_separator:
2023-06-29,15:44:20 | INFO | dataset_resampled: False 2023-06-29,15:44:20 | INFO | dataset_type: auto 2023-06-29,15:44:20 | INFO | ddp_static_graph: False 2023-06-29,15:44:20 | INFO | debug: False 2023-06-29,15:44:20 | INFO | device: cuda:0 2023-06-29,15:44:20 | INFO | dist_backend: nccl 2023-06-29,15:44:20 | INFO | dist_url: env:// 2023-06-29,15:44:20 | INFO | distributed: False 2023-06-29,15:44:20 | INFO | epochs: 5 2023-06-29,15:44:20 | INFO | eps: 1e-06 2023-06-29,15:44:20 | INFO | force_quick_gelu: False 2023-06-29,15:44:20 | INFO | gather_with_grad: False 2023-06-29,15:44:20 | INFO | grad_checkpointing: False 2023-06-29,15:44:20 | INFO | horovod: False 2023-06-29,15:44:20 | INFO | imagenet_v2: None 2023-06-29,15:44:20 | INFO | imagenet_val: None 2023-06-29,15:44:20 | INFO | local_loss: False 2023-06-29,15:44:20 | INFO | local_rank: 0 2023-06-29,15:44:20 | INFO | lock_image: False 2023-06-29,15:44:20 | INFO | lock_image_freeze_bn_stats: False 2023-06-29,15:44:20 | INFO | lock_image_unlocked_groups: 0 2023-06-29,15:44:20 | INFO | log_level: 20 2023-06-29,15:44:20 | INFO | log_local: False 2023-06-29,15:44:20 | INFO | log_path: ./logs/2023_06_29-15_44_18-model_ViT-B-32-lr_1e-06-b_32-j_6-p_amp\out.log 2023-06-29,15:44:20 | INFO | logs: ./logs/ 2023-06-29,15:44:20 | INFO | lr: 1e-06 2023-06-29,15:44:20 | INFO | model: ViT-B-32 2023-06-29,15:44:20 | INFO | name: 2023_06_29-15_44_18-model_ViT-B-32-lr_1e-06-b_32-j_6-p_amp 2023-06-29,15:44:20 | INFO | no_set_device_rank: False 2023-06-29,15:44:20 | INFO | norm_gradient_clip: None 2023-06-29,15:44:20 | INFO | precision: amp 2023-06-29,15:44:20 | INFO | pretrained: openai 2023-06-29,15:44:20 | INFO | pretrained_image: False 2023-06-29,15:44:20 | INFO | rank: 0 2023-06-29,15:44:20 | INFO | report_to: 2023-06-29,15:44:20 | INFO | resume: None 2023-06-29,15:44:20 | INFO | save_frequency: 1 2023-06-29,15:44:20 | INFO | save_most_recent: False 2023-06-29,15:44:20 | INFO | seed: 0 2023-06-29,15:44:20 | INFO | skip_scheduler: False 2023-06-29,15:44:20 | INFO | tensorboard: False 2023-06-29,15:44:20 | INFO | tensorboard_path: 2023-06-29,15:44:20 | INFO | torchscript: False 2023-06-29,15:44:20 | INFO | trace: False 2023-06-29,15:44:20 | INFO | train_data: ../data/train_neg_clip.tsv 2023-06-29,15:44:20 | INFO | train_num_samples: None 2023-06-29,15:44:20 | INFO | use_bn_sync: False 2023-06-29,15:44:20 | INFO | val_data: ../data/valid_neg_clip.tsv 2023-06-29,15:44:20 | INFO | val_frequency: 1 2023-06-29,15:44:20 | INFO | val_num_samples: None 2023-06-29,15:44:20 | INFO | wandb: False 2023-06-29,15:44:20 | INFO | wandb_notes: 2023-06-29,15:44:20 | INFO | warmup: 50 2023-06-29,15:44:20 | INFO | wd: 0.2 2023-06-29,15:44:20 | INFO | workers: 6 2023-06-29,15:44:20 | INFO | world_size: 1 2023-06-29,15:44:20 | INFO | zeroshot_frequency: 2

Result : Eval Epoch: 5 image_to_text_mean_rank: 23.1456
image_to_text_median_rank: 6.0000
image_to_text_R@1: 0.1986
image_to_text_R@5: 0.4996
image_to_text_R@10: 0.6286
text_to_image_mean_rank: 25.8653
text_to_image_median_rank: 5.0000
text_to_image_R@1: 0.2093
text_to_image_R@5: 0.5290
text_to_image_R@10: 0.6593
val_loss: 0.8160
epoch: 5.0000
num_samples: 9678.0000

vinid commented 1 year ago

Could you share the results per each epoch? I'm curious about what's happening during training

Try also evaluating each checkpoint you get on both ARO and the retrieval datasets.

Consider that 32 as batch size might not be enough to train a good model.

You can also follow the conversation in this GitHub issue https://github.com/mertyg/vision-language-models-are-bows/issues/4

On Wed, Jul 12, 2023, 01:31 sigrid414 @.***> wrote:

Hello, thank you for your excellent work. I used the following parameters and could not reproduce the results in the paper. Could you please tell me whether there is a problem with my parameter setting? Params: 2023-06-29,15:44:20 | INFO | batch_size: 32 2023-06-29,15:44:20 | INFO | beta1: 0.9 2023-06-29,15:44:20 | INFO | beta2: 0.98 2023-06-29,15:44:20 | INFO | checkpoint_path: ./logs/2023_06_29-15_44_18-model_ViT-B-32-lr_1e-06-b_32-j_6-p_amp\checkpoints 2023-06-29,15:44:20 | INFO | copy_codebase: False 2023-06-29,15:44:20 | INFO | csv_caption_key: title 2023-06-29,15:44:20 | INFO | csv_hard_captions_key: neg_caption 2023-06-29,15:44:20 | INFO | csv_img_key: filepath 2023-06-29,15:44:20 | INFO | csv_separator: 2023-06-29,15:44:20 | INFO | dataset_resampled: False 2023-06-29,15:44:20 | INFO | dataset_type: auto 2023-06-29,15:44:20 | INFO | ddp_static_graph: False 2023-06-29,15:44:20 | INFO | debug: False 2023-06-29,15:44:20 | INFO | device: cuda:0 2023-06-29,15:44:20 | INFO | dist_backend: nccl 2023-06-29,15:44:20 | INFO | dist_url: env:// 2023-06-29,15:44:20 | INFO | distributed: False 2023-06-29,15:44:20 | INFO | epochs: 5 2023-06-29,15:44:20 | INFO | eps: 1e-06 2023-06-29,15:44:20 | INFO | force_quick_gelu: False 2023-06-29,15:44:20 | INFO | gather_with_grad: False 2023-06-29,15:44:20 | INFO | grad_checkpointing: False 2023-06-29,15:44:20 | INFO | horovod: False 2023-06-29,15:44:20 | INFO | imagenet_v2: None 2023-06-29,15:44:20 | INFO | imagenet_val: None 2023-06-29,15:44:20 | INFO | local_loss: False 2023-06-29,15:44:20 | INFO | local_rank: 0 2023-06-29,15:44:20 | INFO | lock_image: False 2023-06-29,15:44:20 | INFO | lock_image_freeze_bn_stats: False 2023-06-29,15:44:20 | INFO | lock_image_unlocked_groups: 0 2023-06-29,15:44:20 | INFO | log_level: 20 2023-06-29,15:44:20 | INFO | log_local: False 2023-06-29,15:44:20 | INFO | log_path: ./logs/2023_06_29-15_44_18-model_ViT-B-32-lr_1e-06-b_32-j_6-p_amp\out.log 2023-06-29,15:44:20 | INFO | logs: ./logs/ 2023-06-29,15:44:20 | INFO | lr: 1e-06 2023-06-29,15:44:20 | INFO | model: ViT-B-32 2023-06-29,15:44:20 | INFO | name: 2023_06_29-15_44_18-model_ViT-B-32-lr_1e-06-b_32-j_6-p_amp 2023-06-29,15:44:20 | INFO | no_set_device_rank: False 2023-06-29,15:44:20 | INFO | norm_gradient_clip: None 2023-06-29,15:44:20 | INFO | precision: amp 2023-06-29,15:44:20 | INFO | pretrained: openai 2023-06-29,15:44:20 | INFO | pretrained_image: False 2023-06-29,15:44:20 | INFO | rank: 0 2023-06-29,15:44:20 | INFO | report_to: 2023-06-29,15:44:20 | INFO | resume: None 2023-06-29,15:44:20 | INFO | save_frequency: 1 2023-06-29,15:44:20 | INFO | save_most_recent: False 2023-06-29,15:44:20 | INFO | seed: 0 2023-06-29,15:44:20 | INFO | skip_scheduler: False 2023-06-29,15:44:20 | INFO | tensorboard: False 2023-06-29,15:44:20 | INFO | tensorboard_path: 2023-06-29,15:44:20 | INFO | torchscript: False 2023-06-29,15:44:20 | INFO | trace: False 2023-06-29,15:44:20 | INFO | train_data: ../data/train_neg_clip.tsv 2023-06-29,15:44:20 | INFO | train_num_samples: None 2023-06-29,15:44:20 | INFO | use_bn_sync: False 2023-06-29,15:44:20 | INFO | val_data: ../data/valid_neg_clip.tsv 2023-06-29,15:44:20 | INFO | val_frequency: 1 2023-06-29,15:44:20 | INFO | val_num_samples: None 2023-06-29,15:44:20 | INFO | wandb: False 2023-06-29,15:44:20 | INFO | wandb_notes: 2023-06-29,15:44:20 | INFO | warmup: 50 2023-06-29,15:44:20 | INFO | wd: 0.2 2023-06-29,15:44:20 | INFO | workers: 6 2023-06-29,15:44:20 | INFO | world_size: 1 2023-06-29,15:44:20 | INFO | zeroshot_frequency: 2

Result : Eval Epoch: 5 image_to_text_mean_rank: 23.1456 image_to_text_median_rank: 6.0000 @.: 0.1986 @.: 0.4996 @.: 0.6286 text_to_image_mean_rank: 25.8653 text_to_image_median_rank: 5.0000 @.: 0.2093 @.: 0.5290 @.: 0.6593 val_loss: 0.8160 epoch: 5.0000 num_samples: 9678.0000

— Reply to this email directly, view it on GitHub https://github.com/mertyg/vision-language-models-are-bows/issues/27#issuecomment-1632077765, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBSSZ73WX22LFAYAJ4UK3XPZOERANCNFSM6AAAAAAZXBHK2U . You are receiving this because you modified the open/close state.Message ID: @.*** com>

sigrid414 commented 1 year ago

Thanks for your reply, the following is my training process log.

Start epoch 0 2023-06-29,15:44:40 | INFO | Train Epoch: 0 [ 64/109468 (0%)] Loss: 1.0672 (1.067) Data (t): 14.173 Batch (t): 18.240, 1.75441/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,15:45:36 | INFO | Train Epoch: 0 [ 6464/109468 (3%)] Loss: 0.97884 (1.023) Data (t): 0.158 Batch (t): 0.559, 36.0876/s LR: 0.000001 Logit Scale: 99.995 - V4 2023-06-29,15:46:28 | INFO | Train Epoch: 0 [ 12864/109468 (6%)] Loss: 0.87522 (0.9738) Data (t): 0.113 Batch (t): 0.515, 78.8352/s LR: 0.000001 Logit Scale: 99.990 - V4 2023-06-29,15:47:13 | INFO | Train Epoch: 0 [ 19264/109468 (9%)] Loss: 0.73911 (0.9151) Data (t): 0.052 Batch (t): 0.454, 78.3267/s LR: 0.000001 Logit Scale: 99.987 - V4 2023-06-29,15:48:00 | INFO | Train Epoch: 0 [ 25664/109468 (12%)] Loss: 0.60426 (0.8529) Data (t): 0.065 Batch (t): 0.472, 78.0624/s LR: 0.000001 Logit Scale: 99.983 - V4 2023-06-29,15:48:47 | INFO | Train Epoch: 0 [ 32064/109468 (15%)] Loss: 0.62116 (0.8143) Data (t): 0.058 Batch (t): 0.463, 53.4243/s LR: 0.000001 Logit Scale: 99.981 - V4 2023-06-29,15:49:36 | INFO | Train Epoch: 0 [ 38464/109468 (18%)] Loss: 0.88208 (0.8240) Data (t): 0.084 Batch (t): 0.490, 78.5411/s LR: 0.000001 Logit Scale: 99.981 - V4 2023-06-29,15:50:31 | INFO | Train Epoch: 0 [ 44864/109468 (20%)] Loss: 0.70934 (0.8097) Data (t): 0.149 Batch (t): 0.557, 78.6467/s LR: 0.000001 Logit Scale: 99.980 - V4 2023-06-29,15:51:21 | INFO | Train Epoch: 0 [ 51264/109468 (23%)] Loss: 0.73646 (0.8015) Data (t): 0.084 Batch (t): 0.492, 78.6356/s LR: 0.000001 Logit Scale: 99.979 - V4 2023-06-29,15:52:18 | INFO | Train Epoch: 0 [ 57664/109468 (26%)] Loss: 0.61434 (0.7828) Data (t): 0.171 Batch (t): 0.574, 78.8352/s LR: 0.000001 Logit Scale: 99.976 - V4 2023-06-29,15:53:09 | INFO | Train Epoch: 0 [ 64064/109468 (29%)] Loss: 0.61450 (0.7675) Data (t): 0.105 Batch (t): 0.507, 78.8117/s LR: 0.000001 Logit Scale: 99.974 - V4 2023-06-29,15:54:01 | INFO | Train Epoch: 0 [ 70464/109468 (32%)] Loss: 0.53688 (0.7483) Data (t): 0.116 Batch (t): 0.520, 78.5453/s LR: 0.000001 Logit Scale: 99.973 - V4 2023-06-29,15:54:48 | INFO | Train Epoch: 0 [ 76864/109468 (35%)] Loss: 0.77193 (0.7501) Data (t): 0.074 Batch (t): 0.478, 78.1228/s LR: 0.000001 Logit Scale: 99.973 - V4 2023-06-29,15:55:46 | INFO | Train Epoch: 0 [ 83264/109468 (38%)] Loss: 0.69313 (0.7460) Data (t): 0.166 Batch (t): 0.572, 78.2961/s LR: 0.000001 Logit Scale: 99.972 - V4 2023-06-29,15:56:35 | INFO | Train Epoch: 0 [ 89664/109468 (41%)] Loss: 0.51552 (0.7307) Data (t): 0.087 Batch (t): 0.492, 78.4813/s LR: 0.000001 Logit Scale: 99.973 - V4 2023-06-29,15:57:21 | INFO | Train Epoch: 0 [ 96064/109468 (44%)] Loss: 0.61177 (0.7232) Data (t): 0.058 Batch (t): 0.463, 78.5572/s LR: 0.000001 Logit Scale: 99.971 - V4 2023-06-29,15:58:07 | INFO | Train Epoch: 0 [102464/109468 (47%)] Loss: 0.50075 (0.7101) Data (t): 0.055 Batch (t): 0.461, 78.4564/s LR: 0.000001 Logit Scale: 99.970 - V4 2023-06-29,15:58:55 | INFO | Train Epoch: 0 [108864/109468 (50%)] Loss: 0.64704 (0.7066) Data (t): 0.069 Batch (t): 0.474, 78.4478/s LR: 0.000001 Logit Scale: 99.970 - V4 2023-06-29,15:59:48 | INFO | Train Epoch: 0 [115264/109468 (53%)] Loss: 0.73082 (0.7079) Data (t): 0.131 Batch (t): 0.535, 78.4787/s LR: 0.000001 Logit Scale: 99.969 - V4 2023-06-29,16:00:46 | INFO | Train Epoch: 0 [121664/109468 (56%)] Loss: 0.53299 (0.6992) Data (t): 0.168 Batch (t): 0.574, 79.0581/s LR: 0.000001 Logit Scale: 99.967 - V4 2023-06-29,16:01:39 | INFO | Train Epoch: 0 [128064/109468 (59%)] Loss: 0.59743 (0.6943) Data (t): 0.125 Batch (t): 0.533, 78.6426/s LR: 0.000001 Logit Scale: 99.967 - V4 2023-06-29,16:02:32 | INFO | Train Epoch: 0 [134464/109468 (61%)] Loss: 0.49533 (0.6853) Data (t): 0.126 Batch (t): 0.534, 78.6221/s LR: 0.000001 Logit Scale: 99.967 - V4 2023-06-29,16:03:24 | INFO | Train Epoch: 0 [140864/109468 (64%)] Loss: 0.65617 (0.6840) Data (t): 0.106 Batch (t): 0.514, 78.8282/s LR: 0.000001 Logit Scale: 99.967 - V4 2023-06-29,16:04:16 | INFO | Train Epoch: 0 [147264/109468 (67%)] Loss: 0.48263 (0.6756) Data (t): 0.116 Batch (t): 0.525, 78.4551/s LR: 0.000001 Logit Scale: 99.967 - V4 2023-06-29,16:05:07 | INFO | Train Epoch: 0 [153664/109468 (70%)] Loss: 0.70630 (0.6768) Data (t): 0.099 Batch (t): 0.509, 78.6129/s LR: 0.000001 Logit Scale: 99.966 - V4 2023-06-29,16:06:21 | INFO | Train Epoch: 0 [160064/109468 (73%)] Loss: 0.54616 (0.6718) Data (t): 0.324 Batch (t): 0.742, 78.5470/s LR: 0.000001 Logit Scale: 99.967 - V4 2023-06-29,16:07:26 | INFO | Train Epoch: 0 [166464/109468 (76%)] Loss: 0.63927 (0.6706) Data (t): 0.238 Batch (t): 0.650, 79.0320/s LR: 0.000001 Logit Scale: 99.967 - V4 2023-06-29,16:08:18 | INFO | Train Epoch: 0 [172864/109468 (79%)] Loss: 0.57313 (0.6671) Data (t): 0.109 Batch (t): 0.517, 38.7649/s LR: 0.000001 Logit Scale: 99.968 - V4 2023-06-29,16:09:20 | INFO | Train Epoch: 0 [179264/109468 (82%)] Loss: 0.63155 (0.6659) Data (t): 0.210 Batch (t): 0.626, 78.8434/s LR: 0.000001 Logit Scale: 99.966 - V4 2023-06-29,16:10:11 | INFO | Train Epoch: 0 [185664/109468 (85%)] Loss: 0.49375 (0.6602) Data (t): 0.105 Batch (t): 0.507, 78.6412/s LR: 0.000001 Logit Scale: 99.967 - V4 2023-06-29,16:11:16 | INFO | Train Epoch: 0 [192064/109468 (88%)] Loss: 0.55792 (0.6569) Data (t): 0.235 Batch (t): 0.645, 28.8398/s LR: 0.000001 Logit Scale: 99.966 - V4 2023-06-29,16:12:28 | INFO | Train Epoch: 0 [198464/109468 (91%)] Loss: 0.52179 (0.6526) Data (t): 0.317 Batch (t): 0.722, 45.9036/s LR: 0.000001 Logit Scale: 99.967 - V4 2023-06-29,16:13:23 | INFO | Train Epoch: 0 [204864/109468 (94%)] Loss: 0.43796 (0.6461) Data (t): 0.145 Batch (t): 0.552, 79.0181/s LR: 0.000001 Logit Scale: 99.966 - V4 2023-06-29,16:14:27 | INFO | Train Epoch: 0 [211264/109468 (97%)] Loss: 0.61085 (0.6451) Data (t): 0.234 Batch (t): 0.644, 78.5264/s LR: 0.000001 Logit Scale: 99.967 - V4 2023-06-29,16:15:36 | INFO | Train Epoch: 0 [217664/109468 (99%)] Loss: 0.62594 (0.6446) Data (t): 0.275 Batch (t): 0.683, 78.7430/s LR: 0.000001 Logit Scale: 99.968 - V4 2023-06-29,16:15:54 | INFO | Train Epoch: 0 [218880/109468 (100%)] Loss: 0.64600 (0.6446) Data (t): 0.546 Batch (t): 0.956, 79.1609/s LR: 0.000001 Logit Scale: 99.968 - V4 2023-06-29,16:16:22 | INFO | Eval Epoch: 1 [64 / 4839] Loss: 0.739662
2023-06-29,16:16:51 | INFO | Eval Epoch: 1 [6464 / 4839] Loss: 0.749305
2023-06-29,16:17:11 | INFO | Eval Epoch: 1 image_to_text_mean_rank: 22.8763 image_to_text_median_rank: 5.0000 image_to_text_R@1: 0.2029 image_to_text_R@5: 0.5087 image_to_text_R@10: 0.6417 text_to_image_mean_rank: 23.5244 text_to_image_median_rank: 5.0000 text_to_image_R@1: 0.2139 text_to_image_R@5: 0.5363 text_to_image_R@10: 0.6658 val_loss: 0.7470 epoch: 1.0000 num_samples: 9678.0000
2023-06-29,16:17:25 | INFO | Start epoch 1 2023-06-29,16:17:46 | INFO | Train Epoch: 1 [ 64/109468 (0%)] Loss: 0.54156 (0.5416) Data (t): 19.975 Batch (t): 21.429, 1.49333/s LR: 0.000001 Logit Scale: 99.968 - V4 2023-06-29,16:18:47 | INFO | Train Epoch: 1 [ 6464/109468 (3%)] Loss: 0.45601 (0.4988) Data (t): 0.202 Batch (t): 0.610, 78.7116/s LR: 0.000001 Logit Scale: 99.972 - V4 2023-06-29,16:20:08 | INFO | Train Epoch: 1 [ 12864/109468 (6%)] Loss: 0.54631 (0.5146) Data (t): 0.400 Batch (t): 0.810, 78.8408/s LR: 0.000001 Logit Scale: 99.974 - V4 2023-06-29,16:21:24 | INFO | Train Epoch: 1 [ 19264/109468 (9%)] Loss: 0.42069 (0.4911) Data (t): 0.348 Batch (t): 0.760, 78.4534/s LR: 0.000001 Logit Scale: 99.975 - V4 2023-06-29,16:22:26 | INFO | Train Epoch: 1 [ 25664/109468 (12%)] Loss: 0.58480 (0.5099) Data (t): 0.212 Batch (t): 0.618, 78.4392/s LR: 0.000001 Logit Scale: 99.976 - V4 2023-06-29,16:23:29 | INFO | Train Epoch: 1 [ 32064/109468 (15%)] Loss: 0.61510 (0.5274) Data (t): 0.227 Batch (t): 0.632, 78.6203/s LR: 0.000001 Logit Scale: 99.977 - V4 2023-06-29,16:24:54 | INFO | Train Epoch: 1 [ 38464/109468 (18%)] Loss: 0.54620 (0.5301) Data (t): 0.429 Batch (t): 0.845, 78.9780/s LR: 0.000001 Logit Scale: 99.978 - V4 2023-06-29,16:26:06 | INFO | Train Epoch: 1 [ 44864/109468 (20%)] Loss: 0.57988 (0.5363) Data (t): 0.316 Batch (t): 0.724, 78.4545/s LR: 0.000001 Logit Scale: 99.978 - V4 2023-06-29,16:27:29 | INFO | Train Epoch: 1 [ 51264/109468 (23%)] Loss: 0.48037 (0.5301) Data (t): 0.403 Batch (t): 0.831, 79.0272/s LR: 0.000001 Logit Scale: 99.979 - V4 2023-06-29,16:28:38 | INFO | Train Epoch: 1 [ 57664/109468 (26%)] Loss: 0.58801 (0.5359) Data (t): 0.265 Batch (t): 0.683, 78.7179/s LR: 0.000001 Logit Scale: 99.979 - V4 2023-06-29,16:29:42 | INFO | Train Epoch: 1 [ 64064/109468 (29%)] Loss: 0.48306 (0.5311) Data (t): 0.231 Batch (t): 0.641, 78.6550/s LR: 0.000001 Logit Scale: 99.980 - V4 2023-06-29,16:30:52 | INFO | Train Epoch: 1 [ 70464/109468 (32%)] Loss: 0.49007 (0.5277) Data (t): 0.293 Batch (t): 0.705, 79.0242/s LR: 0.000001 Logit Scale: 99.982 - V4 2023-06-29,16:32:16 | INFO | Train Epoch: 1 [ 76864/109468 (35%)] Loss: 0.48036 (0.5240) Data (t): 0.422 Batch (t): 0.841, 78.9034/s LR: 0.000001 Logit Scale: 99.983 - V4 2023-06-29,16:33:24 | INFO | Train Epoch: 1 [ 83264/109468 (38%)] Loss: 0.45241 (0.5189) Data (t): 0.265 Batch (t): 0.680, 78.6859/s LR: 0.000001 Logit Scale: 99.984 - V4 2023-06-29,16:34:45 | INFO | Train Epoch: 1 [ 89664/109468 (41%)] Loss: 0.44799 (0.5142) Data (t): 0.396 Batch (t): 0.809, 78.8275/s LR: 0.000001 Logit Scale: 99.985 - V4 2023-06-29,16:36:04 | INFO | Train Epoch: 1 [ 96064/109468 (44%)] Loss: 0.60696 (0.5200) Data (t): 0.367 Batch (t): 0.781, 78.4488/s LR: 0.000001 Logit Scale: 99.986 - V4 2023-06-29,16:37:21 | INFO | Train Epoch: 1 [102464/109468 (47%)] Loss: 0.25771 (0.5046) Data (t): 0.360 Batch (t): 0.777, 78.4407/s LR: 0.000001 Logit Scale: 99.986 - V4 2023-06-29,16:38:33 | INFO | Train Epoch: 1 [108864/109468 (50%)] Loss: 0.39816 (0.4986) Data (t): 0.311 Batch (t): 0.723, 78.8350/s LR: 0.000001 Logit Scale: 99.987 - V4 2023-06-29,16:39:50 | INFO | Train Epoch: 1 [115264/109468 (53%)] Loss: 0.55935 (0.5018) Data (t): 0.351 Batch (t): 0.763, 78.7448/s LR: 0.000001 Logit Scale: 99.989 - V4 2023-06-29,16:40:55 | INFO | Train Epoch: 1 [121664/109468 (56%)] Loss: 0.50525 (0.5020) Data (t): 0.242 Batch (t): 0.653, 78.6832/s LR: 0.000001 Logit Scale: 99.989 - V4 2023-06-29,16:42:09 | INFO | Train Epoch: 1 [128064/109468 (59%)] Loss: 0.42021 (0.4981) Data (t): 0.328 Batch (t): 0.738, 4.65507/s LR: 0.000001 Logit Scale: 99.990 - V4 2023-06-29,16:43:26 | INFO | Train Epoch: 1 [134464/109468 (61%)] Loss: 0.38891 (0.4932) Data (t): 0.351 Batch (t): 0.766, 78.4427/s LR: 0.000001 Logit Scale: 99.990 - V4 2023-06-29,16:44:38 | INFO | Train Epoch: 1 [140864/109468 (64%)] Loss: 0.56835 (0.4964) Data (t): 0.314 Batch (t): 0.726, 78.6173/s LR: 0.000001 Logit Scale: 99.990 - V4 2023-06-29,16:46:01 | INFO | Train Epoch: 1 [147264/109468 (67%)] Loss: 0.38919 (0.4920) Data (t): 0.420 Batch (t): 0.831, 79.1708/s LR: 0.000001 Logit Scale: 99.989 - V4 2023-06-29,16:47:00 | INFO | Train Epoch: 1 [153664/109468 (70%)] Loss: 0.40893 (0.4886) Data (t): 0.178 Batch (t): 0.591, 78.5438/s LR: 0.000001 Logit Scale: 99.990 - V4 2023-06-29,16:48:07 | INFO | Train Epoch: 1 [160064/109468 (73%)] Loss: 0.45505 (0.4873) Data (t): 0.256 Batch (t): 0.668, 78.5602/s LR: 0.000001 Logit Scale: 99.991 - V4 2023-06-29,16:49:11 | INFO | Train Epoch: 1 [166464/109468 (76%)] Loss: 0.43372 (0.4854) Data (t): 0.226 Batch (t): 0.641, 78.4562/s LR: 0.000001 Logit Scale: 99.992 - V4 2023-06-29,16:50:23 | INFO | Train Epoch: 1 [172864/109468 (79%)] Loss: 0.49672 (0.4858) Data (t): 0.301 Batch (t): 0.715, 78.6215/s LR: 0.000001 Logit Scale: 99.992 - V4 2023-06-29,16:51:35 | INFO | Train Epoch: 1 [179264/109468 (82%)] Loss: 0.35692 (0.4813) Data (t): 0.307 Batch (t): 0.722, 78.9189/s LR: 0.000001 Logit Scale: 99.992 - V4 2023-06-29,16:52:37 | INFO | Train Epoch: 1 [185664/109468 (85%)] Loss: 0.29576 (0.4751) Data (t): 0.206 Batch (t): 0.618, 78.7244/s LR: 0.000001 Logit Scale: 99.993 - V4 2023-06-29,16:53:38 | INFO | Train Epoch: 1 [192064/109468 (88%)] Loss: 0.53253 (0.4770) Data (t): 0.199 Batch (t): 0.612, 78.4686/s LR: 0.000001 Logit Scale: 99.993 - V4 2023-06-29,16:54:47 | INFO | Train Epoch: 1 [198464/109468 (91%)] Loss: 0.34217 (0.4728) Data (t): 0.279 Batch (t): 0.690, 78.8349/s LR: 0.000001 Logit Scale: 99.995 - V4 2023-06-29,16:56:14 | INFO | Train Epoch: 1 [204864/109468 (94%)] Loss: 0.30161 (0.4676) Data (t): 0.452 Batch (t): 0.870, 79.0673/s LR: 0.000001 Logit Scale: 99.995 - V4 2023-06-29,16:57:22 | INFO | Train Epoch: 1 [211264/109468 (97%)] Loss: 0.42300 (0.4663) Data (t): 0.271 Batch (t): 0.682, 68.6084/s LR: 0.000001 Logit Scale: 99.995 - V4 2023-06-29,16:58:18 | INFO | Train Epoch: 1 [217664/109468 (99%)] Loss: 0.43930 (0.4655) Data (t): 0.149 Batch (t): 0.560, 78.8303/s LR: 0.000001 Logit Scale: 99.995 - V4 2023-06-29,16:58:37 | INFO | Train Epoch: 1 [218880/109468 (100%)] Loss: 0.45109 (0.4651) Data (t): 0.558 Batch (t): 0.983, 78.8557/s LR: 0.000001 Logit Scale: 99.995 - V4 2023-06-29,16:58:58 | INFO | Eval Epoch: 2 [64 / 4839] Loss: 0.764778
2023-06-29,16:59:43 | INFO | Eval Epoch: 2 [6464 / 4839] Loss: 0.738438
2023-06-29,17:00:05 | INFO | Eval Epoch: 2 image_to_text_mean_rank: 22.0146 image_to_text_median_rank: 5.0000 image_to_text_R@1: 0.2013 image_to_text_R@5: 0.5094 image_to_text_R@10: 0.6405 text_to_image_mean_rank: 23.2962 text_to_image_median_rank: 5.0000 text_to_image_R@1: 0.2153 text_to_image_R@5: 0.5436 text_to_image_R@10: 0.6705 val_loss: 0.7395 epoch: 2.0000 num_samples: 9678.0000
2023-06-29,17:00:18 | INFO | Start epoch 2 2023-06-29,17:00:36 | INFO | Train Epoch: 2 [ 64/109468 (0%)] Loss: 0.29990 (0.2999) Data (t): 17.382 Batch (t): 18.004, 1.77736/s LR: 0.000001 Logit Scale: 99.995 - V4 2023-06-29,17:01:37 | INFO | Train Epoch: 2 [ 6464/109468 (3%)] Loss: 0.33997 (0.3199) Data (t): 0.205 Batch (t): 0.608, 78.5918/s LR: 0.000001 Logit Scale: 99.999 - V4 2023-06-29,17:02:24 | INFO | Train Epoch: 2 [ 12864/109468 (6%)] Loss: 0.28720 (0.3090) Data (t): 0.073 Batch (t): 0.474, 78.3975/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:03:22 | INFO | Train Epoch: 2 [ 19264/109468 (9%)] Loss: 0.47454 (0.3504) Data (t): 0.172 Batch (t): 0.580, 78.6303/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:04:12 | INFO | Train Epoch: 2 [ 25664/109468 (12%)] Loss: 0.44450 (0.3692) Data (t): 0.094 Batch (t): 0.503, 78.5500/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:05:07 | INFO | Train Epoch: 2 [ 32064/109468 (15%)] Loss: 0.34395 (0.3650) Data (t): 0.139 Batch (t): 0.550, 78.4320/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:05:57 | INFO | Train Epoch: 2 [ 38464/109468 (18%)] Loss: 0.25606 (0.3494) Data (t): 0.091 Batch (t): 0.501, 72.5830/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:06:46 | INFO | Train Epoch: 2 [ 44864/109468 (20%)] Loss: 0.45844 (0.3631) Data (t): 0.073 Batch (t): 0.481, 78.6329/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:07:38 | INFO | Train Epoch: 2 [ 51264/109468 (23%)] Loss: 0.31009 (0.3572) Data (t): 0.120 Batch (t): 0.529, 78.6200/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:08:29 | INFO | Train Epoch: 2 [ 57664/109468 (26%)] Loss: 0.26892 (0.3484) Data (t): 0.095 Batch (t): 0.506, 78.5492/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:09:21 | INFO | Train Epoch: 2 [ 64064/109468 (29%)] Loss: 0.34709 (0.3482) Data (t): 0.106 Batch (t): 0.516, 78.2863/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:10:17 | INFO | Train Epoch: 2 [ 70464/109468 (32%)] Loss: 0.40077 (0.3526) Data (t): 0.159 Batch (t): 0.567, 78.9068/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:11:18 | INFO | Train Epoch: 2 [ 76864/109468 (35%)] Loss: 0.29464 (0.3482) Data (t): 0.198 Batch (t): 0.611, 78.6498/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:12:11 | INFO | Train Epoch: 2 [ 83264/109468 (38%)] Loss: 0.44027 (0.3547) Data (t): 0.117 Batch (t): 0.526, 78.5978/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:13:11 | INFO | Train Epoch: 2 [ 89664/109468 (41%)] Loss: 0.59579 (0.3708) Data (t): 0.192 Batch (t): 0.600, 78.6418/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:14:03 | INFO | Train Epoch: 2 [ 96064/109468 (44%)] Loss: 0.28179 (0.3652) Data (t): 0.110 Batch (t): 0.522, 78.4187/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:15:17 | INFO | Train Epoch: 2 [102464/109468 (47%)] Loss: 0.25009 (0.3585) Data (t): 0.326 Batch (t): 0.741, 78.9187/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:16:07 | INFO | Train Epoch: 2 [108864/109468 (50%)] Loss: 0.35939 (0.3585) Data (t): 0.089 Batch (t): 0.501, 78.6374/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-06-29,17:16:59 | INFO | Train Epoch: 2 [115264/109468 (53%)] Loss: 0.23861 (0.3522) Data (t): 0.108 Batch (t): 0.514, 78.6174/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:18:01 | INFO | Train Epoch: 2 [121664/109468 (56%)] Loss: 0.24138 (0.3467) Data (t): 0.206 Batch (t): 0.619, 78.1407/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:18:51 | INFO | Train Epoch: 2 [128064/109468 (59%)] Loss: 0.22885 (0.3411) Data (t): 0.089 Batch (t): 0.500, 77.7633/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:20:00 | INFO | Train Epoch: 2 [134464/109468 (61%)] Loss: 0.40367 (0.3439) Data (t): 0.281 Batch (t): 0.695, 43.4733/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:20:55 | INFO | Train Epoch: 2 [140864/109468 (64%)] Loss: 0.49248 (0.3504) Data (t): 0.138 Batch (t): 0.548, 78.9198/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:22:03 | INFO | Train Epoch: 2 [147264/109468 (67%)] Loss: 0.31064 (0.3487) Data (t): 0.267 Batch (t): 0.678, 78.6444/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:23:02 | INFO | Train Epoch: 2 [153664/109468 (70%)] Loss: 0.27228 (0.3457) Data (t): 0.180 Batch (t): 0.593, 78.2201/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:23:57 | INFO | Train Epoch: 2 [160064/109468 (73%)] Loss: 0.22899 (0.3412) Data (t): 0.138 Batch (t): 0.551, 78.3507/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:24:54 | INFO | Train Epoch: 2 [166464/109468 (76%)] Loss: 0.31602 (0.3402) Data (t): 0.158 Batch (t): 0.571, 78.6186/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:25:57 | INFO | Train Epoch: 2 [172864/109468 (79%)] Loss: 0.47421 (0.3450) Data (t): 0.219 Batch (t): 0.631, 78.6255/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:26:59 | INFO | Train Epoch: 2 [179264/109468 (82%)] Loss: 0.56447 (0.3526) Data (t): 0.205 Batch (t): 0.620, 78.2385/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:28:06 | INFO | Train Epoch: 2 [185664/109468 (85%)] Loss: 0.39981 (0.3542) Data (t): 0.261 Batch (t): 0.673, 4.10015/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:29:00 | INFO | Train Epoch: 2 [192064/109468 (88%)] Loss: 0.45554 (0.3574) Data (t): 0.129 Batch (t): 0.541, 78.6876/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:30:02 | INFO | Train Epoch: 2 [198464/109468 (91%)] Loss: 0.45936 (0.3606) Data (t): 0.203 Batch (t): 0.619, 78.4444/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:31:37 | INFO | Train Epoch: 2 [204864/109468 (94%)] Loss: 0.32718 (0.3596) Data (t): 0.523 Batch (t): 0.944, 78.8344/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:32:39 | INFO | Train Epoch: 2 [211264/109468 (97%)] Loss: 0.32634 (0.3586) Data (t): 0.208 Batch (t): 0.620, 78.6350/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:33:45 | INFO | Train Epoch: 2 [217664/109468 (99%)] Loss: 0.25424 (0.3556) Data (t): 0.245 Batch (t): 0.659, 78.7344/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:33:56 | INFO | Train Epoch: 2 [218880/109468 (100%)] Loss: 0.35045 (0.3555) Data (t): 0.161 Batch (t): 0.573, 78.6411/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:34:19 | INFO | Eval Epoch: 3 [64 / 4839] Loss: 0.880643
2023-06-29,17:34:58 | INFO | Eval Epoch: 3 [6464 / 4839] Loss: 0.786876
2023-06-29,17:35:15 | INFO | Eval Epoch: 3 image_to_text_mean_rank: 21.5905 image_to_text_median_rank: 5.0000 image_to_text_R@1: 0.2020 image_to_text_R@5: 0.5073 image_to_text_R@10: 0.6418 text_to_image_mean_rank: 24.6662 text_to_image_median_rank: 5.0000 text_to_image_R@1: 0.2088 text_to_image_R@5: 0.5267 text_to_image_R@10: 0.6589 val_loss: 0.7889 epoch: 3.0000 num_samples: 9678.0000
2023-06-29,17:35:24 | INFO | Start epoch 3 2023-06-29,17:35:48 | INFO | Train Epoch: 3 [ 64/109468 (0%)] Loss: 0.32038 (0.3204) Data (t): 23.990 Batch (t): 24.542, 1.30390/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:36:45 | INFO | Train Epoch: 3 [ 6464/109468 (3%)] Loss: 0.30622 (0.3133) Data (t): 0.165 Batch (t): 0.570, 78.9079/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:37:41 | INFO | Train Epoch: 3 [ 12864/109468 (6%)] Loss: 0.26868 (0.2984) Data (t): 0.150 Batch (t): 0.553, 78.3442/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:38:52 | INFO | Train Epoch: 3 [ 19264/109468 (9%)] Loss: 0.38114 (0.3191) Data (t): 0.305 Batch (t): 0.712, 78.6370/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:39:54 | INFO | Train Epoch: 3 [ 25664/109468 (12%)] Loss: 0.31058 (0.3174) Data (t): 0.215 Batch (t): 0.626, 78.5601/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:40:42 | INFO | Train Epoch: 3 [ 32064/109468 (15%)] Loss: 0.33504 (0.3203) Data (t): 0.072 Batch (t): 0.476, 78.2593/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:41:53 | INFO | Train Epoch: 3 [ 38464/109468 (18%)] Loss: 0.20353 (0.3037) Data (t): 0.299 Batch (t): 0.709, 33.3064/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:42:43 | INFO | Train Epoch: 3 [ 44864/109468 (20%)] Loss: 0.42330 (0.3186) Data (t): 0.089 Batch (t): 0.497, 78.2330/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:43:35 | INFO | Train Epoch: 3 [ 51264/109468 (23%)] Loss: 0.17864 (0.3031) Data (t): 0.120 Batch (t): 0.529, 78.4415/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:45:20 | INFO | Train Epoch: 3 [ 57664/109468 (26%)] Loss: 0.28554 (0.3013) Data (t): 0.622 Batch (t): 1.045, 78.3447/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:46:10 | INFO | Train Epoch: 3 [ 64064/109468 (29%)] Loss: 0.23161 (0.2950) Data (t): 0.096 Batch (t): 0.497, 78.6789/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:47:10 | INFO | Train Epoch: 3 [ 70464/109468 (32%)] Loss: 0.21556 (0.2884) Data (t): 0.200 Batch (t): 0.607, 78.2901/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:48:08 | INFO | Train Epoch: 3 [ 76864/109468 (35%)] Loss: 0.20505 (0.2819) Data (t): 0.167 Batch (t): 0.575, 78.7167/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:49:11 | INFO | Train Epoch: 3 [ 83264/109468 (38%)] Loss: 0.31941 (0.2846) Data (t): 0.216 Batch (t): 0.627, 78.6412/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:50:06 | INFO | Train Epoch: 3 [ 89664/109468 (41%)] Loss: 0.33108 (0.2877) Data (t): 0.145 Batch (t): 0.554, 78.6408/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:50:58 | INFO | Train Epoch: 3 [ 96064/109468 (44%)] Loss: 0.21691 (0.2833) Data (t): 0.116 Batch (t): 0.524, 78.5277/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:52:09 | INFO | Train Epoch: 3 [102464/109468 (47%)] Loss: 0.44546 (0.2928) Data (t): 0.296 Batch (t): 0.711, 78.1359/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:53:11 | INFO | Train Epoch: 3 [108864/109468 (50%)] Loss: 0.25910 (0.2910) Data (t): 0.208 Batch (t): 0.619, 36.8176/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:54:08 | INFO | Train Epoch: 3 [115264/109468 (53%)] Loss: 0.36376 (0.2948) Data (t): 0.161 Batch (t): 0.571, 62.5931/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:55:17 | INFO | Train Epoch: 3 [121664/109468 (56%)] Loss: 0.36440 (0.2983) Data (t): 0.271 Batch (t): 0.686, 78.3808/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:56:26 | INFO | Train Epoch: 3 [128064/109468 (59%)] Loss: 0.37320 (0.3018) Data (t): 0.273 Batch (t): 0.686, 41.2542/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:57:30 | INFO | Train Epoch: 3 [134464/109468 (61%)] Loss: 0.29186 (0.3014) Data (t): 0.229 Batch (t): 0.643, 78.5991/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:58:33 | INFO | Train Epoch: 3 [140864/109468 (64%)] Loss: 0.26907 (0.3000) Data (t): 0.222 Batch (t): 0.632, 78.6266/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,17:59:36 | INFO | Train Epoch: 3 [147264/109468 (67%)] Loss: 0.25150 (0.2980) Data (t): 0.209 Batch (t): 0.626, 78.5987/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:00:41 | INFO | Train Epoch: 3 [153664/109468 (70%)] Loss: 0.29466 (0.2978) Data (t): 0.241 Batch (t): 0.656, 78.7876/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:01:47 | INFO | Train Epoch: 3 [160064/109468 (73%)] Loss: 0.33378 (0.2992) Data (t): 0.239 Batch (t): 0.655, 78.4722/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:02:42 | INFO | Train Epoch: 3 [166464/109468 (76%)] Loss: 0.14946 (0.2937) Data (t): 0.141 Batch (t): 0.551, 78.5429/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:03:47 | INFO | Train Epoch: 3 [172864/109468 (79%)] Loss: 0.27780 (0.2931) Data (t): 0.241 Batch (t): 0.654, 78.4478/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:04:49 | INFO | Train Epoch: 3 [179264/109468 (82%)] Loss: 0.20868 (0.2902) Data (t): 0.201 Batch (t): 0.614, 77.3554/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:05:51 | INFO | Train Epoch: 3 [185664/109468 (85%)] Loss: 0.34220 (0.2919) Data (t): 0.203 Batch (t): 0.619, 78.7110/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:07:17 | INFO | Train Epoch: 3 [192064/109468 (88%)] Loss: 0.31261 (0.2926) Data (t): 0.448 Batch (t): 0.864, 78.7566/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:08:19 | INFO | Train Epoch: 3 [198464/109468 (91%)] Loss: 0.30109 (0.2929) Data (t): 0.207 Batch (t): 0.618, 78.4564/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:09:15 | INFO | Train Epoch: 3 [204864/109468 (94%)] Loss: 0.33051 (0.2940) Data (t): 0.150 Batch (t): 0.561, 78.8412/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:10:19 | INFO | Train Epoch: 3 [211264/109468 (97%)] Loss: 0.21923 (0.2918) Data (t): 0.226 Batch (t): 0.638, 78.9041/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:11:20 | INFO | Train Epoch: 3 [217664/109468 (99%)] Loss: 0.24360 (0.2904) Data (t): 0.202 Batch (t): 0.612, 78.8122/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:11:38 | INFO | Train Epoch: 3 [218880/109468 (100%)] Loss: 0.47287 (0.2955) Data (t): 0.544 Batch (t): 0.950, 78.8308/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:12:08 | INFO | Eval Epoch: 4 [64 / 4839] Loss: 0.785412
2023-06-29,18:13:00 | INFO | Eval Epoch: 4 [6464 / 4839] Loss: 0.811952
2023-06-29,18:13:25 | INFO | Eval Epoch: 4 image_to_text_mean_rank: 23.0725 image_to_text_median_rank: 6.0000 image_to_text_R@1: 0.2002 image_to_text_R@5: 0.4995 image_to_text_R@10: 0.6308 text_to_image_mean_rank: 25.3585 text_to_image_median_rank: 5.0000 text_to_image_R@1: 0.2089 text_to_image_R@5: 0.5258 text_to_image_R@10: 0.6548 val_loss: 0.8173 epoch: 4.0000 num_samples: 9678.0000
2023-06-29,18:13:32 | INFO | Start epoch 4 2023-06-29,18:14:04 | INFO | Train Epoch: 4 [ 64/109468 (0%)] Loss: 0.22172 (0.2217) Data (t): 31.267 Batch (t): 31.862, 1.00432/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:15:15 | INFO | Train Epoch: 4 [ 6464/109468 (3%)] Loss: 0.22600 (0.2239) Data (t): 0.311 Batch (t): 0.717, 78.8164/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:16:14 | INFO | Train Epoch: 4 [ 12864/109468 (6%)] Loss: 0.23976 (0.2292) Data (t): 0.173 Batch (t): 0.583, 78.6401/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:17:15 | INFO | Train Epoch: 4 [ 19264/109468 (9%)] Loss: 0.34719 (0.2587) Data (t): 0.206 Batch (t): 0.617, 78.4830/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:18:04 | INFO | Train Epoch: 4 [ 25664/109468 (12%)] Loss: 0.21735 (0.2504) Data (t): 0.082 Batch (t): 0.491, 78.5022/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:19:02 | INFO | Train Epoch: 4 [ 32064/109468 (15%)] Loss: 0.22192 (0.2457) Data (t): 0.164 Batch (t): 0.574, 78.1021/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:19:53 | INFO | Train Epoch: 4 [ 38464/109468 (18%)] Loss: 0.27087 (0.2493) Data (t): 0.101 Batch (t): 0.509, 78.8710/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:20:47 | INFO | Train Epoch: 4 [ 44864/109468 (20%)] Loss: 0.25276 (0.2497) Data (t): 0.131 Batch (t): 0.543, 78.8271/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:21:46 | INFO | Train Epoch: 4 [ 51264/109468 (23%)] Loss: 0.28136 (0.2532) Data (t): 0.177 Batch (t): 0.591, 78.6371/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:22:38 | INFO | Train Epoch: 4 [ 57664/109468 (26%)] Loss: 0.37415 (0.2653) Data (t): 0.114 Batch (t): 0.520, 78.7362/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:23:26 | INFO | Train Epoch: 4 [ 64064/109468 (29%)] Loss: 0.33080 (0.2713) Data (t): 0.081 Batch (t): 0.481, 78.6413/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:24:22 | INFO | Train Epoch: 4 [ 70464/109468 (32%)] Loss: 0.23171 (0.2680) Data (t): 0.143 Batch (t): 0.553, 78.4443/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:25:32 | INFO | Train Epoch: 4 [ 76864/109468 (35%)] Loss: 0.30153 (0.2705) Data (t): 0.295 Batch (t): 0.709, 78.9185/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:26:25 | INFO | Train Epoch: 4 [ 83264/109468 (38%)] Loss: 0.19182 (0.2649) Data (t): 0.116 Batch (t): 0.521, 78.5254/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:27:20 | INFO | Train Epoch: 4 [ 89664/109468 (41%)] Loss: 0.24335 (0.2635) Data (t): 0.142 Batch (t): 0.551, 78.7180/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:28:12 | INFO | Train Epoch: 4 [ 96064/109468 (44%)] Loss: 0.29348 (0.2654) Data (t): 0.112 Batch (t): 0.522, 78.6404/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:29:10 | INFO | Train Epoch: 4 [102464/109468 (47%)] Loss: 0.27384 (0.2659) Data (t): 0.174 Batch (t): 0.584, 78.8284/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:30:20 | INFO | Train Epoch: 4 [108864/109468 (50%)] Loss: 0.34566 (0.2703) Data (t): 0.291 Batch (t): 0.696, 78.6412/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:31:06 | INFO | Train Epoch: 4 [115264/109468 (53%)] Loss: 0.21166 (0.2672) Data (t): 0.057 Batch (t): 0.464, 78.4545/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:32:06 | INFO | Train Epoch: 4 [121664/109468 (56%)] Loss: 0.18058 (0.2629) Data (t): 0.191 Batch (t): 0.597, 9.07596/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:33:00 | INFO | Train Epoch: 4 [128064/109468 (59%)] Loss: 0.27472 (0.2634) Data (t): 0.131 Batch (t): 0.538, 79.0279/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:34:02 | INFO | Train Epoch: 4 [134464/109468 (61%)] Loss: 0.20746 (0.2609) Data (t): 0.214 Batch (t): 0.622, 78.6448/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:35:06 | INFO | Train Epoch: 4 [140864/109468 (64%)] Loss: 0.33104 (0.2639) Data (t): 0.227 Batch (t): 0.638, 78.6357/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:36:01 | INFO | Train Epoch: 4 [147264/109468 (67%)] Loss: 0.33449 (0.2669) Data (t): 0.147 Batch (t): 0.553, 78.6413/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:36:56 | INFO | Train Epoch: 4 [153664/109468 (70%)] Loss: 0.21775 (0.2649) Data (t): 0.137 Batch (t): 0.548, 78.4321/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:37:52 | INFO | Train Epoch: 4 [160064/109468 (73%)] Loss: 0.28577 (0.2657) Data (t): 0.146 Batch (t): 0.556, 78.4469/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:39:01 | INFO | Train Epoch: 4 [166464/109468 (76%)] Loss: 0.24210 (0.2648) Data (t): 0.287 Batch (t): 0.699, 79.0211/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:40:00 | INFO | Train Epoch: 4 [172864/109468 (79%)] Loss: 0.34949 (0.2679) Data (t): 0.171 Batch (t): 0.583, 78.4243/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:40:54 | INFO | Train Epoch: 4 [179264/109468 (82%)] Loss: 0.40615 (0.2726) Data (t): 0.130 Batch (t): 0.541, 78.8343/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:41:51 | INFO | Train Epoch: 4 [185664/109468 (85%)] Loss: 0.46029 (0.2789) Data (t): 0.160 Batch (t): 0.568, 78.2580/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:42:46 | INFO | Train Epoch: 4 [192064/109468 (88%)] Loss: 0.30071 (0.2796) Data (t): 0.140 Batch (t): 0.555, 78.6354/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:43:58 | INFO | Train Epoch: 4 [198464/109468 (91%)] Loss: 0.29088 (0.2799) Data (t): 0.298 Batch (t): 0.715, 32.7522/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:45:01 | INFO | Train Epoch: 4 [204864/109468 (94%)] Loss: 0.23992 (0.2787) Data (t): 0.226 Batch (t): 0.635, 78.2630/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:46:14 | INFO | Train Epoch: 4 [211264/109468 (97%)] Loss: 0.32449 (0.2801) Data (t): 0.318 Batch (t): 0.731, 79.0285/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:47:12 | INFO | Train Epoch: 4 [217664/109468 (99%)] Loss: 0.36467 (0.2825) Data (t): 0.165 Batch (t): 0.581, 78.8326/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:47:22 | INFO | Train Epoch: 4 [218880/109468 (100%)] Loss: 0.28539 (0.2826) Data (t): 0.112 Batch (t): 0.521, 78.8303/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-06-29,18:47:54 | INFO | Eval Epoch: 5 [64 / 4839] Loss: 0.721772
2023-06-29,18:48:29 | INFO | Eval Epoch: 5 [6464 / 4839] Loss: 0.818251
2023-06-29,18:48:46 | INFO | Eval Epoch: 5 image_to_text_mean_rank: 23.1456 image_to_text_median_rank: 6.0000 image_to_text_R@1: 0.1986 image_to_text_R@5: 0.4996 image_to_text_R@10: 0.6286 text_to_image_mean_rank: 25.8653 text_to_image_median_rank: 5.0000 text_to_image_R@1: 0.2093 text_to_image_R@5: 0.5290 text_to_image_R@10: 0.6593 val_loss: 0.8160 epoch: 5.0000 num_samples: 9678.0000

vinid commented 1 year ago

You probably need a bigger batch size, looks like the model is already overfitting.

Did you test ARO and the other tasks on the checkpoints?

sigrid414 commented 1 year ago

Thanks again for your reply. I tried to change to a larger batchsize. I tried to reproduce ARO before and the result was similar to Table 4 in your paper

vinid commented 1 year ago

Nice, is this solved then?

sigrid414 commented 1 year ago

After I changed the equipment, batchsiaze could only be set to 128, but the result was still not significantly improved. The following is my parameter Settings and experimental results.

2023-07-14,07:17:28 | INFO | Params: 2023-07-14,07:17:28 | INFO | batch_size: 128 2023-07-14,07:17:28 | INFO | beta1: 0.9 2023-07-14,07:17:28 | INFO | beta2: 0.98 2023-07-14,07:17:28 | INFO | checkpoint_path: ./logs/2023_07_14-07_17_22-model_ViT-B-32-lr_1e-06-b_128-j_0-p_amp/checkpoints 2023-07-14,07:17:28 | INFO | copy_codebase: False 2023-07-14,07:17:28 | INFO | csv_caption_key: title 2023-07-14,07:17:28 | INFO | csv_hard_captions_key: neg_caption 2023-07-14,07:17:28 | INFO | csv_img_key: filepath 2023-07-14,07:17:28 | INFO | csv_separator:
2023-07-14,07:17:28 | INFO | dataset_resampled: False 2023-07-14,07:17:28 | INFO | dataset_type: auto 2023-07-14,07:17:28 | INFO | ddp_static_graph: False 2023-07-14,07:17:28 | INFO | debug: False 2023-07-14,07:17:28 | INFO | device: cuda:0 2023-07-14,07:17:28 | INFO | dist_backend: nccl 2023-07-14,07:17:28 | INFO | dist_url: env:// 2023-07-14,07:17:28 | INFO | distributed: False 2023-07-14,07:17:28 | INFO | epochs: 5 2023-07-14,07:17:28 | INFO | eps: 1e-06 2023-07-14,07:17:28 | INFO | force_quick_gelu: False 2023-07-14,07:17:28 | INFO | gather_with_grad: False 2023-07-14,07:17:28 | INFO | grad_checkpointing: False 2023-07-14,07:17:28 | INFO | horovod: False 2023-07-14,07:17:28 | INFO | imagenet_v2: None 2023-07-14,07:17:28 | INFO | imagenet_val: None 2023-07-14,07:17:28 | INFO | local_loss: False 2023-07-14,07:17:28 | INFO | local_rank: 0 2023-07-14,07:17:28 | INFO | lock_image: False 2023-07-14,07:17:28 | INFO | lock_image_freeze_bn_stats: False 2023-07-14,07:17:28 | INFO | lock_image_unlocked_groups: 0 2023-07-14,07:17:28 | INFO | log_level: 20 2023-07-14,07:17:28 | INFO | log_local: False 2023-07-14,07:17:28 | INFO | log_path: ./logs/2023_07_14-07_17_22-model_ViT-B-32-lr_1e-06-b_128-j_0-p_amp/out.log 2023-07-14,07:17:28 | INFO | logs: ./logs/ 2023-07-14,07:17:28 | INFO | lr: 1e-06 2023-07-14,07:17:28 | INFO | model: ViT-B-32 2023-07-14,07:17:28 | INFO | name: 2023_07_14-07_17_22-model_ViT-B-32-lr_1e-06-b_128-j_0-p_amp 2023-07-14,07:17:28 | INFO | no_set_device_rank: False 2023-07-14,07:17:28 | INFO | norm_gradient_clip: None 2023-07-14,07:17:28 | INFO | precision: amp 2023-07-14,07:17:28 | INFO | pretrained: openai 2023-07-14,07:17:28 | INFO | pretrained_image: False 2023-07-14,07:17:28 | INFO | rank: 0 2023-07-14,07:17:28 | INFO | report_to: 2023-07-14,07:17:28 | INFO | resume: None 2023-07-14,07:17:28 | INFO | save_frequency: 1 2023-07-14,07:17:28 | INFO | save_most_recent: False 2023-07-14,07:17:28 | INFO | seed: 0 2023-07-14,07:17:28 | INFO | skip_scheduler: False 2023-07-14,07:17:28 | INFO | tensorboard: False 2023-07-14,07:17:28 | INFO | tensorboard_path: 2023-07-14,07:17:28 | INFO | torchscript: False 2023-07-14,07:17:28 | INFO | trace: False 2023-07-14,07:17:28 | INFO | train_data: ./data/train_neg_clip.tsv 2023-07-14,07:17:28 | INFO | train_num_samples: None 2023-07-14,07:17:28 | INFO | use_bn_sync: False 2023-07-14,07:17:28 | INFO | val_data: ./data/valid_neg_clip.tsv 2023-07-14,07:17:28 | INFO | val_frequency: 1 2023-07-14,07:17:28 | INFO | val_num_samples: None 2023-07-14,07:17:28 | INFO | wandb: False 2023-07-14,07:17:28 | INFO | wandb_notes: 2023-07-14,07:17:28 | INFO | warmup: 50 2023-07-14,07:17:28 | INFO | wd: 0.2 2023-07-14,07:17:28 | INFO | workers: 0 2023-07-14,07:17:28 | INFO | world_size: 1 2023-07-14,07:17:28 | INFO | zeroshot_frequency: 2 2023-07-14,07:17:31 | INFO | Start epoch 0 2023-07-14,07:17:36 | INFO | Train Epoch: 0 [ 256/109468 (0%)] Loss: 1.4302 (1.430) Data (t): 3.181 Batch (t): 5.660, 22.6159/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,07:25:12 | INFO | Train Epoch: 0 [ 25856/109468 (12%)] Loss: 1.0671 (1.249) Data (t): 4.061 Batch (t): 4.560, 26.9247/s LR: 0.000001 Logit Scale: 99.996 - V4 2023-07-14,07:32:38 | INFO | Train Epoch: 0 [ 51456/109468 (24%)] Loss: 1.0778 (1.192) Data (t): 3.957 Batch (t): 4.456, 31.1337/s LR: 0.000001 Logit Scale: 99.992 - V4 2023-07-14,07:39:27 | INFO | Train Epoch: 0 [ 77056/109468 (35%)] Loss: 1.0220 (1.149) Data (t): 3.592 Batch (t): 4.091, 31.4411/s LR: 0.000001 Logit Scale: 99.990 - V4 2023-07-14,07:45:48 | INFO | Train Epoch: 0 [102656/109468 (47%)] Loss: 0.89162 (1.098) Data (t): 3.307 Batch (t): 3.808, 36.8938/s LR: 0.000001 Logit Scale: 99.989 - V4 2023-07-14,07:51:50 | INFO | Train Epoch: 0 [128256/109468 (59%)] Loss: 0.93995 (1.071) Data (t): 3.123 Batch (t): 3.623, 38.2984/s LR: 0.000001 Logit Scale: 99.988 - V4 2023-07-14,07:57:42 | INFO | Train Epoch: 0 [153856/109468 (70%)] Loss: 0.87553 (1.043) Data (t): 3.019 Batch (t): 3.519, 39.3830/s LR: 0.000001 Logit Scale: 99.988 - V4 2023-07-14,08:03:27 | INFO | Train Epoch: 0 [179456/109468 (82%)] Loss: 0.92891 (1.029) Data (t): 2.946 Batch (t): 3.448, 38.3920/s LR: 0.000001 Logit Scale: 99.988 - V4 2023-07-14,08:09:02 | INFO | Train Epoch: 0 [205056/109468 (94%)] Loss: 0.86192 (1.011) Data (t): 2.855 Batch (t): 3.353, 39.2727/s LR: 0.000001 Logit Scale: 99.987 - V4 2023-07-14,08:12:00 | INFO | Train Epoch: 0 [218880/109468 (100%)] Loss: 0.77184 (0.9867) Data (t): 2.800 Batch (t): 3.298, 38.8265/s LR: 0.000001 Logit Scale: 99.987 - V4 2023-07-14,08:12:04 | INFO | Eval Epoch: 1 [256 / 4839] Loss: 0.923471
2023-07-14,08:13:52 | INFO | Eval Epoch: 1 image_to_text_mean_rank: 21.2029 image_to_text_median_rank: 5.0000 image_to_text_R@1: 0.2114 image_to_text_R@5: 0.5208 image_to_text_R@10: 0.6483 text_to_image_mean_rank: 22.4917 text_to_image_median_rank: 5.0000 text_to_image_R@1: 0.2137 text_to_image_R@5: 0.5331 text_to_image_R@10: 0.6668 val_loss: 0.9573 epoch: 1.0000 num_samples: 9678.0000 2023-07-14,08:13:55 | INFO | Start epoch 1 2023-07-14,08:13:58 | INFO | Train Epoch: 1 [ 256/109468 (0%)] Loss: 0.77637 (0.7764) Data (t): 2.429 Batch (t): 2.987, 42.8473/s LR: 0.000001 Logit Scale: 99.987 - V4 2023-07-14,08:18:48 | INFO | Train Epoch: 1 [ 25856/109468 (12%)] Loss: 0.86375 (0.8201) Data (t): 2.411 Batch (t): 2.908, 45.1780/s LR: 0.000001 Logit Scale: 99.989 - V4 2023-07-14,08:23:37 | INFO | Train Epoch: 1 [ 51456/109468 (24%)] Loss: 0.85335 (0.8312) Data (t): 2.381 Batch (t): 2.882, 45.1215/s LR: 0.000001 Logit Scale: 99.992 - V4 2023-07-14,08:28:23 | INFO | Train Epoch: 1 [ 77056/109468 (35%)] Loss: 0.83323 (0.8317) Data (t): 2.360 Batch (t): 2.859, 44.9686/s LR: 0.000001 Logit Scale: 99.993 - V4 2023-07-14,08:33:10 | INFO | Train Epoch: 1 [102656/109468 (47%)] Loss: 0.76629 (0.8186) Data (t): 2.378 Batch (t): 2.877, 43.8843/s LR: 0.000001 Logit Scale: 99.994 - V4 2023-07-14,08:37:59 | INFO | Train Epoch: 1 [128256/109468 (59%)] Loss: 0.77283 (0.8110) Data (t): 2.393 Batch (t): 2.892, 44.6028/s LR: 0.000001 Logit Scale: 99.994 - V4 2023-07-14,08:42:49 | INFO | Train Epoch: 1 [153856/109468 (70%)] Loss: 0.65149 (0.7882) Data (t): 2.393 Batch (t): 2.892, 43.5383/s LR: 0.000001 Logit Scale: 99.994 - V4 2023-07-14,08:47:38 | INFO | Train Epoch: 1 [179456/109468 (82%)] Loss: 0.70026 (0.7772) Data (t): 2.400 Batch (t): 2.898, 44.7467/s LR: 0.000001 Logit Scale: 99.994 - V4 2023-07-14,08:52:26 | INFO | Train Epoch: 1 [205056/109468 (94%)] Loss: 0.80084 (0.7798) Data (t): 2.377 Batch (t): 2.875, 43.5677/s LR: 0.000001 Logit Scale: 99.995 - V4 2023-07-14,08:55:03 | INFO | Train Epoch: 1 [218880/109468 (100%)] Loss: 0.77056 (0.7789) Data (t): 2.401 Batch (t): 2.900, 45.3670/s LR: 0.000001 Logit Scale: 99.996 - V4 2023-07-14,08:55:05 | INFO | Eval Epoch: 2 [256 / 4839] Loss: 0.859714
2023-07-14,08:56:42 | INFO | Eval Epoch: 2 image_to_text_mean_rank: 21.0603 image_to_text_median_rank: 5.0000 image_to_text_R@1: 0.2110 image_to_text_R@5: 0.5288 image_to_text_R@10: 0.6531 text_to_image_mean_rank: 22.2901 text_to_image_median_rank: 4.0000 text_to_image_R@1: 0.2192 text_to_image_R@5: 0.5460 text_to_image_R@10: 0.6698 val_loss: 0.9239 epoch: 2.0000 num_samples: 9678.0000 2023-07-14,08:56:45 | INFO | Start epoch 2 2023-07-14,08:56:47 | INFO | Train Epoch: 2 [ 256/109468 (0%)] Loss: 0.58777 (0.5878) Data (t): 2.383 Batch (t): 2.893, 44.2508/s LR: 0.000001 Logit Scale: 99.996 - V4 2023-07-14,09:01:38 | INFO | Train Epoch: 2 [ 25856/109468 (12%)] Loss: 0.79836 (0.6931) Data (t): 2.411 Batch (t): 2.909, 44.4864/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-07-14,09:06:30 | INFO | Train Epoch: 2 [ 51456/109468 (24%)] Loss: 0.81392 (0.7333) Data (t): 2.416 Batch (t): 2.914, 44.3083/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-07-14,09:11:19 | INFO | Train Epoch: 2 [ 77056/109468 (35%)] Loss: 0.72864 (0.7322) Data (t): 2.398 Batch (t): 2.897, 44.0882/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-07-14,09:16:10 | INFO | Train Epoch: 2 [102656/109468 (47%)] Loss: 0.63540 (0.7128) Data (t): 2.406 Batch (t): 2.906, 43.4486/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-07-14,09:20:59 | INFO | Train Epoch: 2 [128256/109468 (59%)] Loss: 0.56828 (0.6887) Data (t): 2.391 Batch (t): 2.890, 44.3427/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:25:49 | INFO | Train Epoch: 2 [153856/109468 (70%)] Loss: 0.70010 (0.6904) Data (t): 2.398 Batch (t): 2.896, 44.0299/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:30:37 | INFO | Train Epoch: 2 [179456/109468 (82%)] Loss: 0.72440 (0.6946) Data (t): 2.387 Batch (t): 2.886, 44.2478/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:35:27 | INFO | Train Epoch: 2 [205056/109468 (94%)] Loss: 0.69242 (0.6944) Data (t): 2.394 Batch (t): 2.892, 43.2381/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:38:02 | INFO | Train Epoch: 2 [218880/109468 (100%)] Loss: 0.71661 (0.6966) Data (t): 2.388 Batch (t): 2.887, 44.6194/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:38:05 | INFO | Eval Epoch: 3 [256 / 4839] Loss: 0.869993
2023-07-14,09:39:42 | INFO | Eval Epoch: 3 image_to_text_mean_rank: 20.0877 image_to_text_median_rank: 5.0000 image_to_text_R@1: 0.2137 image_to_text_R@5: 0.5331 image_to_text_R@10: 0.6637 text_to_image_mean_rank: 23.5539 text_to_image_median_rank: 5.0000 text_to_image_R@1: 0.2174 text_to_image_R@5: 0.5459 text_to_image_R@10: 0.6699 val_loss: 0.9222 epoch: 3.0000 num_samples: 9678.0000 2023-07-14,09:39:44 | INFO | Start epoch 3 2023-07-14,09:39:47 | INFO | Train Epoch: 3 [ 256/109468 (0%)] Loss: 0.61338 (0.6134) Data (t): 2.438 Batch (t): 2.935, 43.6089/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:44:33 | INFO | Train Epoch: 3 [ 25856/109468 (12%)] Loss: 0.61469 (0.6140) Data (t): 2.356 Batch (t): 2.855, 43.0422/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:49:20 | INFO | Train Epoch: 3 [ 51456/109468 (24%)] Loss: 0.77142 (0.6665) Data (t): 2.370 Batch (t): 2.869, 42.6858/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:54:06 | INFO | Train Epoch: 3 [ 77056/109468 (35%)] Loss: 0.55016 (0.6374) Data (t): 2.361 Batch (t): 2.860, 45.1310/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:58:52 | INFO | Train Epoch: 3 [102656/109468 (47%)] Loss: 0.54167 (0.6183) Data (t): 2.371 Batch (t): 2.869, 42.1579/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:03:43 | INFO | Train Epoch: 3 [128256/109468 (59%)] Loss: 0.65526 (0.6244) Data (t): 2.405 Batch (t): 2.903, 42.4634/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:08:32 | INFO | Train Epoch: 3 [153856/109468 (70%)] Loss: 0.62443 (0.6244) Data (t): 2.395 Batch (t): 2.894, 44.3579/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:13:19 | INFO | Train Epoch: 3 [179456/109468 (82%)] Loss: 0.56187 (0.6166) Data (t): 2.369 Batch (t): 2.868, 44.4981/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:18:08 | INFO | Train Epoch: 3 [205056/109468 (94%)] Loss: 0.51795 (0.6056) Data (t): 2.387 Batch (t): 2.888, 44.4553/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:20:45 | INFO | Train Epoch: 3 [218880/109468 (100%)] Loss: 0.58593 (0.6037) Data (t): 2.406 Batch (t): 2.904, 44.7540/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:20:47 | INFO | Eval Epoch: 4 [256 / 4839] Loss: 0.897077
2023-07-14,10:22:24 | INFO | Eval Epoch: 4 image_to_text_mean_rank: 20.5825 image_to_text_median_rank: 5.0000 image_to_text_R@1: 0.2125 image_to_text_R@5: 0.5280 image_to_text_R@10: 0.6588 text_to_image_mean_rank: 23.4955 text_to_image_median_rank: 5.0000 text_to_image_R@1: 0.2176 text_to_image_R@5: 0.5451 text_to_image_R@10: 0.6653 val_loss: 0.9375 epoch: 4.0000 num_samples: 9678.0000 2023-07-14,10:22:26 | INFO | Start epoch 4 2023-07-14,10:22:29 | INFO | Train Epoch: 4 [ 256/109468 (0%)] Loss: 0.63721 (0.6372) Data (t): 2.444 Batch (t): 2.951, 43.3703/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:27:17 | INFO | Train Epoch: 4 [ 25856/109468 (12%)] Loss: 0.59264 (0.6149) Data (t): 2.380 Batch (t): 2.878, 45.9206/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:32:05 | INFO | Train Epoch: 4 [ 51456/109468 (24%)] Loss: 0.53577 (0.5885) Data (t): 2.384 Batch (t): 2.883, 43.6510/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:36:55 | INFO | Train Epoch: 4 [ 77056/109468 (35%)] Loss: 0.62699 (0.5982) Data (t): 2.396 Batch (t): 2.895, 45.3030/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:41:40 | INFO | Train Epoch: 4 [102656/109468 (47%)] Loss: 0.57694 (0.5939) Data (t): 2.360 Batch (t): 2.859, 45.9198/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:46:28 | INFO | Train Epoch: 4 [128256/109468 (59%)] Loss: 0.52156 (0.5819) Data (t): 2.374 Batch (t): 2.873, 43.9230/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:51:13 | INFO | Train Epoch: 4 [153856/109468 (70%)] Loss: 0.54435 (0.5765) Data (t): 2.353 Batch (t): 2.855, 45.5270/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:55:59 | INFO | Train Epoch: 4 [179456/109468 (82%)] Loss: 0.59680 (0.5790) Data (t): 2.360 Batch (t): 2.858, 45.2921/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,11:00:46 | INFO | Train Epoch: 4 [205056/109468 (94%)] Loss: 0.72806 (0.5956) Data (t): 2.367 Batch (t): 2.866, 44.7487/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,11:03:22 | INFO | Train Epoch: 4 [218880/109468 (100%)] Loss: 0.62989 (0.5990) Data (t): 2.386 Batch (t): 2.886, 45.0568/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,11:03:24 | INFO | Eval Epoch: 5 [256 / 4839] Loss: 0.891879
2023-07-14,11:05:00 | INFO | Eval Epoch: 5 image_to_text_mean_rank: 19.7327 image_to_text_median_rank: 5.0000 image_to_text_R@1: 0.2122 image_to_text_R@5: 0.5338 image_to_text_R@10: 0.6645 text_to_image_mean_rank: 22.1873 text_to_image_median_rank: 4.0000 text_to_image_R@1: 0.2178 text_to_image_R@5: 0.5494 text_to_image_R@10: 0.6712 val_loss: 0.9244 epoch: 5.0000 num_samples: 9678.0000

vinid commented 1 year ago

Could I ask you to format these results in a more readable manner and report the performance on ARO and MSCOCO retrieval (using the evaluation pipeline, not the logging)

On Fri, Jul 14, 2023, 05:18 sigrid414 @.***> wrote:

After I changed the equipment, batchsiaze could only be set to 128, but the result was still not significantly improved. The following is my parameter Settings and experimental results.

2023-07-14,07:17:28 | INFO | Params: 2023-07-14,07:17:28 | INFO | batch_size: 128 2023-07-14,07:17:28 | INFO | beta1: 0.9 2023-07-14,07:17:28 | INFO | beta2: 0.98 2023-07-14,07:17:28 | INFO | checkpoint_path: ./logs/2023_07_14-07_17_22-model_ViT-B-32-lr_1e-06-b_128-j_0-p_amp/checkpoints 2023-07-14,07:17:28 | INFO | copy_codebase: False 2023-07-14,07:17:28 | INFO | csv_caption_key: title 2023-07-14,07:17:28 | INFO | csv_hard_captions_key: neg_caption 2023-07-14,07:17:28 | INFO | csv_img_key: filepath 2023-07-14,07:17:28 | INFO | csv_separator: 2023-07-14,07:17:28 | INFO | dataset_resampled: False 2023-07-14,07:17:28 | INFO | dataset_type: auto 2023-07-14,07:17:28 | INFO | ddp_static_graph: False 2023-07-14,07:17:28 | INFO | debug: False 2023-07-14,07:17:28 | INFO | device: cuda:0 2023-07-14,07:17:28 | INFO | dist_backend: nccl 2023-07-14,07:17:28 | INFO | dist_url: env:// 2023-07-14,07:17:28 | INFO | distributed: False 2023-07-14,07:17:28 | INFO | epochs: 5 2023-07-14,07:17:28 | INFO | eps: 1e-06 2023-07-14,07:17:28 | INFO | force_quick_gelu: False 2023-07-14,07:17:28 | INFO | gather_with_grad: False 2023-07-14,07:17:28 | INFO | grad_checkpointing: False 2023-07-14,07:17:28 | INFO | horovod: False 2023-07-14,07:17:28 | INFO | imagenet_v2: None 2023-07-14,07:17:28 | INFO | imagenet_val: None 2023-07-14,07:17:28 | INFO | local_loss: False 2023-07-14,07:17:28 | INFO | local_rank: 0 2023-07-14,07:17:28 | INFO | lock_image: False 2023-07-14,07:17:28 | INFO | lock_image_freeze_bn_stats: False 2023-07-14,07:17:28 | INFO | lock_image_unlocked_groups: 0 2023-07-14,07:17:28 | INFO | log_level: 20 2023-07-14,07:17:28 | INFO | log_local: False 2023-07-14,07:17:28 | INFO | log_path: ./logs/2023_07_14-07_17_22-model_ViT-B-32-lr_1e-06-b_128-j_0-p_amp/out.log 2023-07-14,07:17:28 | INFO | logs: ./logs/ 2023-07-14,07:17:28 | INFO | lr: 1e-06 2023-07-14,07:17:28 | INFO | model: ViT-B-32 2023-07-14,07:17:28 | INFO | name: 2023_07_14-07_17_22-model_ViT-B-32-lr_1e-06-b_128-j_0-p_amp 2023-07-14,07:17:28 | INFO | no_set_device_rank: False 2023-07-14,07:17:28 | INFO | norm_gradient_clip: None 2023-07-14,07:17:28 | INFO | precision: amp 2023-07-14,07:17:28 | INFO | pretrained: openai 2023-07-14,07:17:28 | INFO | pretrained_image: False 2023-07-14,07:17:28 | INFO | rank: 0 2023-07-14,07:17:28 | INFO | report_to: 2023-07-14,07:17:28 | INFO | resume: None 2023-07-14,07:17:28 | INFO | save_frequency: 1 2023-07-14,07:17:28 | INFO | save_most_recent: False 2023-07-14,07:17:28 | INFO | seed: 0 2023-07-14,07:17:28 | INFO | skip_scheduler: False 2023-07-14,07:17:28 | INFO | tensorboard: False 2023-07-14,07:17:28 | INFO | tensorboard_path: 2023-07-14,07:17:28 | INFO | torchscript: False 2023-07-14,07:17:28 | INFO | trace: False 2023-07-14,07:17:28 | INFO | train_data: ./data/train_neg_clip.tsv 2023-07-14,07:17:28 | INFO | train_num_samples: None 2023-07-14,07:17:28 | INFO | use_bn_sync: False 2023-07-14,07:17:28 | INFO | val_data: ./data/valid_neg_clip.tsv 2023-07-14,07:17:28 | INFO | val_frequency: 1 2023-07-14,07:17:28 | INFO | val_num_samples: None 2023-07-14,07:17:28 | INFO | wandb: False 2023-07-14,07:17:28 | INFO | wandb_notes: 2023-07-14,07:17:28 | INFO | warmup: 50 2023-07-14,07:17:28 | INFO | wd: 0.2 2023-07-14,07:17:28 | INFO | workers: 0 2023-07-14,07:17:28 | INFO | world_size: 1 2023-07-14,07:17:28 | INFO | zeroshot_frequency: 2 2023-07-14,07:17:31 | INFO | Start epoch 0 2023-07-14,07:17:36 | INFO | Train Epoch: 0 [ 256/109468 (0%)] Loss: 1.4302 (1.430) Data (t): 3.181 Batch (t): 5.660, 22.6159/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,07:25:12 | INFO | Train Epoch: 0 [ 25856/109468 (12%)] Loss: 1.0671 (1.249) Data (t): 4.061 Batch (t): 4.560, 26.9247/s LR: 0.000001 Logit Scale: 99.996 - V4 2023-07-14,07:32:38 | INFO | Train Epoch: 0 [ 51456/109468 (24%)] Loss: 1.0778 (1.192) Data (t): 3.957 Batch (t): 4.456, 31.1337/s LR: 0.000001 Logit Scale: 99.992 - V4 2023-07-14,07:39:27 | INFO | Train Epoch: 0 [ 77056/109468 (35%)] Loss: 1.0220 (1.149) Data (t): 3.592 Batch (t): 4.091, 31.4411/s LR: 0.000001 Logit Scale: 99.990 - V4 2023-07-14,07:45:48 | INFO | Train Epoch: 0 [102656/109468 (47%)] Loss: 0.89162 (1.098) Data (t): 3.307 Batch (t): 3.808, 36.8938/s LR: 0.000001 Logit Scale: 99.989 - V4 2023-07-14,07:51:50 | INFO | Train Epoch: 0 [128256/109468 (59%)] Loss: 0.93995 (1.071) Data (t): 3.123 Batch (t): 3.623, 38.2984/s LR: 0.000001 Logit Scale: 99.988 - V4 2023-07-14,07:57:42 | INFO | Train Epoch: 0 [153856/109468 (70%)] Loss: 0.87553 (1.043) Data (t): 3.019 Batch (t): 3.519, 39.3830/s LR: 0.000001 Logit Scale: 99.988 - V4 2023-07-14,08:03:27 | INFO | Train Epoch: 0 [179456/109468 (82%)] Loss: 0.92891 (1.029) Data (t): 2.946 Batch (t): 3.448, 38.3920/s LR: 0.000001 Logit Scale: 99.988 - V4 2023-07-14,08:09:02 | INFO | Train Epoch: 0 [205056/109468 (94%)] Loss: 0.86192 (1.011) Data (t): 2.855 Batch (t): 3.353, 39.2727/s LR: 0.000001 Logit Scale: 99.987 - V4 2023-07-14,08:12:00 | INFO | Train Epoch: 0 [218880/109468 (100%)] Loss: 0.77184 (0.9867) Data (t): 2.800 Batch (t): 3.298, 38.8265/s LR: 0.000001 Logit Scale: 99.987 - V4 2023-07-14,08:12:04 | INFO | Eval Epoch: 1 [256 / 4839] Loss: 0.923471 2023-07-14,08:13:52 | INFO | Eval Epoch: 1 image_to_text_mean_rank: 21.2029 image_to_text_median_rank: 5.0000 @.: 0.2114 @.: 0.5208 @.: 0.6483 text_to_image_mean_rank: 22.4917 text_to_image_median_rank: 5.0000 @.: 0.2137 @.: 0.5331 @.: 0.6668 val_loss: 0.9573 epoch: 1.0000 num_samples: 9678.0000 2023-07-14,08:13:55 | INFO | Start epoch 1 2023-07-14,08:13:58 | INFO | Train Epoch: 1 [ 256/109468 (0%)] Loss: 0.77637 (0.7764) Data (t): 2.429 Batch (t): 2.987, 42.8473/s LR: 0.000001 Logit Scale: 99.987 - V4 2023-07-14,08:18:48 | INFO | Train Epoch: 1 [ 25856/109468 (12%)] Loss: 0.86375 (0.8201) Data (t): 2.411 Batch (t): 2.908, 45.1780/s LR: 0.000001 Logit Scale: 99.989 - V4 2023-07-14,08:23:37 | INFO | Train Epoch: 1 [ 51456/109468 (24%)] Loss: 0.85335 (0.8312) Data (t): 2.381 Batch (t): 2.882, 45.1215/s LR: 0.000001 Logit Scale: 99.992 - V4 2023-07-14,08:28:23 | INFO | Train Epoch: 1 [ 77056/109468 (35%)] Loss: 0.83323 (0.8317) Data (t): 2.360 Batch (t): 2.859, 44.9686/s LR: 0.000001 Logit Scale: 99.993 - V4 2023-07-14,08:33:10 | INFO | Train Epoch: 1 [102656/109468 (47%)] Loss: 0.76629 (0.8186) Data (t): 2.378 Batch (t): 2.877, 43.8843/s LR: 0.000001 Logit Scale: 99.994 - V4 2023-07-14,08:37:59 | INFO | Train Epoch: 1 [128256/109468 (59%)] Loss: 0.77283 (0.8110) Data (t): 2.393 Batch (t): 2.892, 44.6028/s LR: 0.000001 Logit Scale: 99.994 - V4 2023-07-14,08:42:49 | INFO | Train Epoch: 1 [153856/109468 (70%)] Loss: 0.65149 (0.7882) Data (t): 2.393 Batch (t): 2.892, 43.5383/s LR: 0.000001 Logit Scale: 99.994 - V4 2023-07-14,08:47:38 | INFO | Train Epoch: 1 [179456/109468 (82%)] Loss: 0.70026 (0.7772) Data (t): 2.400 Batch (t): 2.898, 44.7467/s LR: 0.000001 Logit Scale: 99.994 - V4 2023-07-14,08:52:26 | INFO | Train Epoch: 1 [205056/109468 (94%)] Loss: 0.80084 (0.7798) Data (t): 2.377 Batch (t): 2.875, 43.5677/s LR: 0.000001 Logit Scale: 99.995 - V4 2023-07-14,08:55:03 | INFO | Train Epoch: 1 [218880/109468 (100%)] Loss: 0.77056 (0.7789) Data (t): 2.401 Batch (t): 2.900, 45.3670/s LR: 0.000001 Logit Scale: 99.996 - V4 2023-07-14,08:55:05 | INFO | Eval Epoch: 2 [256 / 4839] Loss: 0.859714 2023-07-14,08:56:42 | INFO | Eval Epoch: 2 image_to_text_mean_rank: 21.0603 image_to_text_median_rank: 5.0000 @.: 0.2110 @.: 0.5288 @.: 0.6531 text_to_image_mean_rank: 22.2901 text_to_image_median_rank: 4.0000 @.: 0.2192 @.: 0.5460 @.: 0.6698 val_loss: 0.9239 epoch: 2.0000 num_samples: 9678.0000 2023-07-14,08:56:45 | INFO | Start epoch 2 2023-07-14,08:56:47 | INFO | Train Epoch: 2 [ 256/109468 (0%)] Loss: 0.58777 (0.5878) Data (t): 2.383 Batch (t): 2.893, 44.2508/s LR: 0.000001 Logit Scale: 99.996 - V4 2023-07-14,09:01:38 | INFO | Train Epoch: 2 [ 25856/109468 (12%)] Loss: 0.79836 (0.6931) Data (t): 2.411 Batch (t): 2.909, 44.4864/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-07-14,09:06:30 | INFO | Train Epoch: 2 [ 51456/109468 (24%)] Loss: 0.81392 (0.7333) Data (t): 2.416 Batch (t): 2.914, 44.3083/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-07-14,09:11:19 | INFO | Train Epoch: 2 [ 77056/109468 (35%)] Loss: 0.72864 (0.7322) Data (t): 2.398 Batch (t): 2.897, 44.0882/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-07-14,09:16:10 | INFO | Train Epoch: 2 [102656/109468 (47%)] Loss: 0.63540 (0.7128) Data (t): 2.406 Batch (t): 2.906, 43.4486/s LR: 0.000001 Logit Scale: 100.000 - V4 2023-07-14,09:20:59 | INFO | Train Epoch: 2 [128256/109468 (59%)] Loss: 0.56828 (0.6887) Data (t): 2.391 Batch (t): 2.890, 44.3427/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:25:49 | INFO | Train Epoch: 2 [153856/109468 (70%)] Loss: 0.70010 (0.6904) Data (t): 2.398 Batch (t): 2.896, 44.0299/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:30:37 | INFO | Train Epoch: 2 [179456/109468 (82%)] Loss: 0.72440 (0.6946) Data (t): 2.387 Batch (t): 2.886, 44.2478/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:35:27 | INFO | Train Epoch: 2 [205056/109468 (94%)] Loss: 0.69242 (0.6944) Data (t): 2.394 Batch (t): 2.892, 43.2381/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:38:02 | INFO | Train Epoch: 2 [218880/109468 (100%)] Loss: 0.71661 (0.6966) Data (t): 2.388 Batch (t): 2.887, 44.6194/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:38:05 | INFO | Eval Epoch: 3 [256 / 4839] Loss: 0.869993 2023-07-14,09:39:42 | INFO | Eval Epoch: 3 image_to_text_mean_rank: 20.0877 image_to_text_median_rank: 5.0000 @.: 0.2137 @.: 0.5331 @.: 0.6637 text_to_image_mean_rank: 23.5539 text_to_image_median_rank: 5.0000 @.: 0.2174 @.: 0.5459 @.: 0.6699 val_loss: 0.9222 epoch: 3.0000 num_samples: 9678.0000 2023-07-14,09:39:44 | INFO | Start epoch 3 2023-07-14,09:39:47 | INFO | Train Epoch: 3 [ 256/109468 (0%)] Loss: 0.61338 (0.6134) Data (t): 2.438 Batch (t): 2.935, 43.6089/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:44:33 | INFO | Train Epoch: 3 [ 25856/109468 (12%)] Loss: 0.61469 (0.6140) Data (t): 2.356 Batch (t): 2.855, 43.0422/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:49:20 | INFO | Train Epoch: 3 [ 51456/109468 (24%)] Loss: 0.77142 (0.6665) Data (t): 2.370 Batch (t): 2.869, 42.6858/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:54:06 | INFO | Train Epoch: 3 [ 77056/109468 (35%)] Loss: 0.55016 (0.6374) Data (t): 2.361 Batch (t): 2.860, 45.1310/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,09:58:52 | INFO | Train Epoch: 3 [102656/109468 (47%)] Loss: 0.54167 (0.6183) Data (t): 2.371 Batch (t): 2.869, 42.1579/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:03:43 | INFO | Train Epoch: 3 [128256/109468 (59%)] Loss: 0.65526 (0.6244) Data (t): 2.405 Batch (t): 2.903, 42.4634/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:08:32 | INFO | Train Epoch: 3 [153856/109468 (70%)] Loss: 0.62443 (0.6244) Data (t): 2.395 Batch (t): 2.894, 44.3579/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:13:19 | INFO | Train Epoch: 3 [179456/109468 (82%)] Loss: 0.56187 (0.6166) Data (t): 2.369 Batch (t): 2.868, 44.4981/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:18:08 | INFO | Train Epoch: 3 [205056/109468 (94%)] Loss: 0.51795 (0.6056) Data (t): 2.387 Batch (t): 2.888, 44.4553/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:20:45 | INFO | Train Epoch: 3 [218880/109468 (100%)] Loss: 0.58593 (0.6037) Data (t): 2.406 Batch (t): 2.904, 44.7540/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:20:47 | INFO | Eval Epoch: 4 [256 / 4839] Loss: 0.897077 2023-07-14,10:22:24 | INFO | Eval Epoch: 4 image_to_text_mean_rank: 20.5825 image_to_text_median_rank: 5.0000 @.: 0.2125 @.: 0.5280 @.: 0.6588 text_to_image_mean_rank: 23.4955 text_to_image_median_rank: 5.0000 @.: 0.2176 @.: 0.5451 @.: 0.6653 val_loss: 0.9375 epoch: 4.0000 num_samples: 9678.0000 2023-07-14,10:22:26 | INFO | Start epoch 4 2023-07-14,10:22:29 | INFO | Train Epoch: 4 [ 256/109468 (0%)] Loss: 0.63721 (0.6372) Data (t): 2.444 Batch (t): 2.951, 43.3703/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:27:17 | INFO | Train Epoch: 4 [ 25856/109468 (12%)] Loss: 0.59264 (0.6149) Data (t): 2.380 Batch (t): 2.878, 45.9206/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:32:05 | INFO | Train Epoch: 4 [ 51456/109468 (24%)] Loss: 0.53577 (0.5885) Data (t): 2.384 Batch (t): 2.883, 43.6510/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:36:55 | INFO | Train Epoch: 4 [ 77056/109468 (35%)] Loss: 0.62699 (0.5982) Data (t): 2.396 Batch (t): 2.895, 45.3030/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:41:40 | INFO | Train Epoch: 4 [102656/109468 (47%)] Loss: 0.57694 (0.5939) Data (t): 2.360 Batch (t): 2.859, 45.9198/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:46:28 | INFO | Train Epoch: 4 [128256/109468 (59%)] Loss: 0.52156 (0.5819) Data (t): 2.374 Batch (t): 2.873, 43.9230/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:51:13 | INFO | Train Epoch: 4 [153856/109468 (70%)] Loss: 0.54435 (0.5765) Data (t): 2.353 Batch (t): 2.855, 45.5270/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,10:55:59 | INFO | Train Epoch: 4 [179456/109468 (82%)] Loss: 0.59680 (0.5790) Data (t): 2.360 Batch (t): 2.858, 45.2921/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,11:00:46 | INFO | Train Epoch: 4 [205056/109468 (94%)] Loss: 0.72806 (0.5956) Data (t): 2.367 Batch (t): 2.866, 44.7487/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,11:03:22 | INFO | Train Epoch: 4 [218880/109468 (100%)] Loss: 0.62989 (0.5990) Data (t): 2.386 Batch (t): 2.886, 45.0568/s LR: 0.000000 Logit Scale: 100.000 - V4 2023-07-14,11:03:24 | INFO | Eval Epoch: 5 [256 / 4839] Loss: 0.891879 2023-07-14,11:05:00 | INFO | Eval Epoch: 5 image_to_text_mean_rank: 19.7327 image_to_text_median_rank: 5.0000 @.: 0.2122 @.: 0.5338 @.: 0.6645 text_to_image_mean_rank: 22.1873 text_to_image_median_rank: 4.0000 @.: 0.2178 @.: 0.5494 @.: 0.6712 val_loss: 0.9244 epoch: 5.0000 num_samples: 9678.0000

— Reply to this email directly, view it on GitHub https://github.com/mertyg/vision-language-models-are-bows/issues/27#issuecomment-1635782009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBSS5MZZ27GMHTS7TSTGTXQE2KDANCNFSM6AAAAAAZXBHK2U . You are receiving this because you modified the open/close state.Message ID: @.*** com>