mlvlab / MELTR

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
MIT License
32 stars 6 forks source link

Negative loss #5

Open EmreOzkose opened 1 year ago

EmreOzkose commented 1 year ago

Hi, thanks for sharing this great work.

I am trying to train univl and got negative loss. Is it okay? Have you ever observed this issue? I am using small batch (12). The default epoch was 1 in the code, but I set it to 40 as in the paper.

Screenshot from 2023-10-22 17-35-52

EmreOzkose commented 1 year ago

I saw it in the paper now. CMLM and CMFM losses are negative Screenshot from 2023-10-22 17-50-58

EmreOzkose commented 1 year ago

I am training univl on YouCook2. When I only change batch-size=12, epoch=40 and num_thread_reader=8, I got

2023-10-22 15:06:46,979:INFO: Epoch: 1/40, Step: 20/814, Lr: 0.00009939, Loss: 1.866647
2023-10-22 15:07:03,972:INFO: Epoch: 1/40, Step: 40/814, Lr: 0.00009877, Loss: 3.736308
2023-10-22 15:07:20,059:INFO: Epoch: 1/40, Step: 60/814, Lr: 0.00009816, Loss: 4.082280
2023-10-22 15:07:37,250:INFO: Epoch: 1/40, Step: 80/814, Lr: 0.00009754, Loss: 3.955996
2023-10-22 15:07:54,521:INFO: Epoch: 1/40, Step: 100/814, Lr: 0.00009693, Loss: 3.716820
2023-10-22 15:08:10,713:INFO: Epoch: 1/40, Step: 120/814, Lr: 0.00009632, Loss: 3.448043
2023-10-22 15:08:27,875:INFO: Epoch: 1/40, Step: 140/814, Lr: 0.00009570, Loss: 3.177194
2023-10-22 15:08:45,059:INFO: Epoch: 1/40, Step: 160/814, Lr: 0.00009509, Loss: 2.960228
2023-10-22 15:09:01,251:INFO: Epoch: 1/40, Step: 180/814, Lr: 0.00009447, Loss: 2.960163
2023-10-22 15:09:18,414:INFO: Epoch: 1/40, Step: 200/814, Lr: 0.00009386, Loss: 2.978737
2023-10-22 15:09:35,416:INFO: Epoch: 1/40, Step: 220/814, Lr: 0.00009325, Loss: 2.989994
2023-10-22 15:09:51,538:INFO: Epoch: 1/40, Step: 240/814, Lr: 0.00009263, Loss: 3.486469
2023-10-22 15:10:08,708:INFO: Epoch: 1/40, Step: 260/814, Lr: 0.00009202, Loss: 3.878906
2023-10-22 15:10:25,854:INFO: Epoch: 1/40, Step: 280/814, Lr: 0.00009140, Loss: 4.187306
2023-10-22 15:10:41,975:INFO: Epoch: 1/40, Step: 300/814, Lr: 0.00009079, Loss: 4.434147
2023-10-22 15:10:59,252:INFO: Epoch: 1/40, Step: 320/814, Lr: 0.00009018, Loss: 4.645205
2023-10-22 15:11:16,365:INFO: Epoch: 1/40, Step: 340/814, Lr: 0.00008956, Loss: 4.827587
2023-10-22 15:11:32,559:INFO: Epoch: 1/40, Step: 360/814, Lr: 0.00008895, Loss: 4.988645
2023-10-22 15:11:49,671:INFO: Epoch: 1/40, Step: 380/814, Lr: 0.00008833, Loss: 5.133381
....

and

Epoch1: R@1: 0.0003 - R@5: 0.0024 - R@10: 0.0047 - Median R: 1420.0
Epoch2: R@1: 0.0009 - R@5: 0.0033 - R@10: 0.0047 - Median R: 1250.0

Is it an expected behavior? num_thread_reader affects a lot. For example, if I set num_thread_reader as 0, I got:

2023-10-22 17:06:43,930:INFO: Epoch: 1/40, Step: 20/814, Lr: 0.00009939, Loss: 1.779154
2023-10-22 17:07:01,318:INFO: Epoch: 1/40, Step: 40/814, Lr: 0.00009877, Loss: 2.946439
2023-10-22 17:07:17,514:INFO: Epoch: 1/40, Step: 60/814, Lr: 0.00009816, Loss: 3.340093
2023-10-22 17:07:34,819:INFO: Epoch: 1/40, Step: 80/814, Lr: 0.00009754, Loss: 3.044905
2023-10-22 17:07:52,372:INFO: Epoch: 1/40, Step: 100/814, Lr: 0.00009693, Loss: 1.260845
2023-10-22 17:08:08,849:INFO: Epoch: 1/40, Step: 120/814, Lr: 0.00009632, Loss: -0.255153
2023-10-22 17:08:26,377:INFO: Epoch: 1/40, Step: 140/814, Lr: 0.00009570, Loss: -1.452571
2023-10-22 17:08:44,079:INFO: Epoch: 1/40, Step: 160/814, Lr: 0.00009509, Loss: -2.369938
2023-10-22 17:09:00,729:INFO: Epoch: 1/40, Step: 180/814, Lr: 0.00009447, Loss: -3.111302
2023-10-22 17:09:18,303:INFO: Epoch: 1/40, Step: 200/814, Lr: 0.00009386, Loss: -3.725561
2023-10-22 17:09:35,800:INFO: Epoch: 1/40, Step: 220/814, Lr: 0.00009325, Loss: -4.231674
2023-10-22 17:09:52,260:INFO: Epoch: 1/40, Step: 240/814, Lr: 0.00009263, Loss: -4.650835
2023-10-22 17:10:09,855:INFO: Epoch: 1/40, Step: 260/814, Lr: 0.00009202, Loss: -4.995438
2023-10-22 17:10:27,374:INFO: Epoch: 1/40, Step: 280/814, Lr: 0.00009140, Loss: -5.271745
2023-10-22 17:10:43,840:INFO: Epoch: 1/40, Step: 300/814, Lr: 0.00009079, Loss: -5.477592
2023-10-22 17:11:01,326:INFO: Epoch: 1/40, Step: 320/814, Lr: 0.00009018, Loss: -5.562934
2023-10-22 17:11:18,837:INFO: Epoch: 1/40, Step: 340/814, Lr: 0.00008956, Loss: -5.491091
2023-10-22 17:11:35,302:INFO: Epoch: 1/40, Step: 360/814, Lr: 0.00008895, Loss: -4.861334
2023-10-22 17:11:52,936:INFO: Epoch: 1/40, Step: 380/814, Lr: 0.00008833, Loss: -4.227199
2023-10-22 17:12:10,386:INFO: Epoch: 1/40, Step: 400/814, Lr: 0.00008772, Loss: -3.648329
2023-10-22 17:12:26,914:INFO: Epoch: 1/40, Step: 420/814, Lr: 0.00008710, Loss: -3.130327
ikodoh commented 1 year ago

First, we observed that the loss is negative value and it doesn't matter to the performance. I'm also surprised that num_thread_reader significantly affects the training. I wonder that you only change num_thread_reader and the other parameters are set identical. If so, I recommend to run the model with default setting.

EmreOzkose commented 1 year ago

I searched a little bit and saw this topic. I think the issue of num_thread_reader is expected behavior. Unfortunately, I don't have a hardware which I can set batch to 128 :).