Open LuoyaoChen opened 1 month ago
Hello, I faced the same problem. Could the authors take a look at it? Thank you so much!
Hello, I ran the finetuning COCO Captioning finetuning using the script and change the batchsize to 32
bash run_scripts/blip2/train/train_caption_coco.sh
,but not LAVIS/run_scripts/blip2/train/train_caption_coco_from_scratch.sh
.
And I got results
{"val": {"Bleu_1": 0.8238268241644443, "Bleu_2": 0.6831525165191635, "Bleu_3": 0.5459414870155539, "Bleu_4": 0.42831303869532394, "METEOR": 0.30180281319130425, "ROUGE_L": 0.6091627079501748, "CIDEr": 1.374218736104006, "SPICE": 0.23455270984413482}} {"val": {"Bleu_1": 0.8298774354937364, "Bleu_2": 0.6908474983471536, "Bleu_3": 0.5532925604069465, "Bleu_4": 0.43515568592502296, "METEOR": 0.30497520102164544, "ROUGE_L": 0.613052542301079, "CIDEr": 1.3963855142315682, "SPICE": 0.23759949992763962}} {"val": {"Bleu_1": 0.8302620239059283, "Bleu_2": 0.6921764063375653, "Bleu_3": 0.5552159289224236, "Bleu_4": 0.43842650265562216, "METEOR": 0.3060731453972931, "ROUGE_L": 0.6135836343823102, "CIDEr": 1.407976274775359, "SPICE": 0.23905837952464407}} {"val": {"Bleu_1": 0.8274924247946817, "Bleu_2": 0.6931325074722023, "Bleu_3": 0.5597664645378789, "Bleu_4": 0.4443899752112241, "METEOR": 0.3066574635742406, "ROUGE_L": 0.6172149598445694, "CIDEr": 1.4154786962527268, "SPICE": 0.23882330241316768}} {"val": {"Bleu_1": 0.8328429952300106, "Bleu_2": 0.6960647900571959, "Bleu_3": 0.5604930980057795, "Bleu_4": 0.4433169242234679, "METEOR": 0.3082065717220316, "ROUGE_L": 0.6171616222938877, "CIDEr": 1.4218121069160456, "SPICE": 0.2402601718946845}} {"test": {"Bleu_1": 0.8293025112126673, "Bleu_2": 0.6928262979474319, "Bleu_3": 0.5591914780467612, "Bleu_4": 0.4419784673449433, "METEOR": 0.30831059059342697, "ROUGE_L": 0.6176090226726898, "CIDEr": 1.4323682191311553, "SPICE": 0.24242663638610848}}
I don‘t konw if I success? 🧸 Looking forward your reply!
Hello, I ran the finetuning COCO Captioning finetuning using the script and change the batchsize to 32
bash run_scripts/blip2/train/train_caption_coco.sh
,but notLAVIS/run_scripts/blip2/train/train_caption_coco_from_scratch.sh
. And I got results{"val": {"Bleu_1": 0.8238268241644443, "Bleu_2": 0.6831525165191635, "Bleu_3": 0.5459414870155539, "Bleu_4": 0.42831303869532394, "METEOR": 0.30180281319130425, "ROUGE_L": 0.6091627079501748, "CIDEr": 1.374218736104006, "SPICE": 0.23455270984413482}} {"val": {"Bleu_1": 0.8298774354937364, "Bleu_2": 0.6908474983471536, "Bleu_3": 0.5532925604069465, "Bleu_4": 0.43515568592502296, "METEOR": 0.30497520102164544, "ROUGE_L": 0.613052542301079, "CIDEr": 1.3963855142315682, "SPICE": 0.23759949992763962}} {"val": {"Bleu_1": 0.8302620239059283, "Bleu_2": 0.6921764063375653, "Bleu_3": 0.5552159289224236, "Bleu_4": 0.43842650265562216, "METEOR": 0.3060731453972931, "ROUGE_L": 0.6135836343823102, "CIDEr": 1.407976274775359, "SPICE": 0.23905837952464407}} {"val": {"Bleu_1": 0.8274924247946817, "Bleu_2": 0.6931325074722023, "Bleu_3": 0.5597664645378789, "Bleu_4": 0.4443899752112241, "METEOR": 0.3066574635742406, "ROUGE_L": 0.6172149598445694, "CIDEr": 1.4154786962527268, "SPICE": 0.23882330241316768}} {"val": {"Bleu_1": 0.8328429952300106, "Bleu_2": 0.6960647900571959, "Bleu_3": 0.5604930980057795, "Bleu_4": 0.4433169242234679, "METEOR": 0.3082065717220316, "ROUGE_L": 0.6171616222938877, "CIDEr": 1.4218121069160456, "SPICE": 0.2402601718946845}} {"test": {"Bleu_1": 0.8293025112126673, "Bleu_2": 0.6928262979474319, "Bleu_3": 0.5591914780467612, "Bleu_4": 0.4419784673449433, "METEOR": 0.30831059059342697, "ROUGE_L": 0.6176090226726898, "CIDEr": 1.4323682191311553, "SPICE": 0.24242663638610848}}
I don‘t konw if I success? 🧸 Looking forward your reply!
Wow, this is really good result. Did you face the max_length problem? Or could you please show the shape of attention_mask
as is posted above? Thank you!
No, I don't face the max_length problem. My atts_opt.shape = [1, 32], opt_tokens.attention_mask.shape = [1, 4] and attention_mask.shape = [1, 36]
No, I don't face the max_length problem. My atts_opt.shape = [1, 32], opt_tokens.attention_mask.shape = [1, 4] and attention_mask.shape = [1, 36]
Sounds good. Thank you so much for your reply!
Hi, @lxr-1204 !
Thank you so much for your reply! It is encouraging to know that your approach works. There were 2 differences I could imagine that might have caused my low performance,
Thank you!
Hello, @LuoyaoChen
By the way, I don't seem to have seen your fellow's LAVIS/run_scripts/blip2/train/train_caption_coco_from_scratch.sh
.
@lxr-1204
{"val": {"Bleu_1": 0.8211486028870585, "Bleu_2": 0.6786320500852762, "Bleu_3": 0.5412082052650652, "Bleu_4": 0.42445955867868645, "METEOR": 0.3000809459398856, "ROUGE_L": 0.6023497159599724, "CIDEr": 1.3504911582924761, "SPICE": 0.2336149951256639}}
{"val": {"Bleu_1": 0.8239893849887242, "Bleu_2": 0.6832399775841378, "Bleu_3": 0.5456446406373248, "Bleu_4": 0.428344329322689, "METEOR": 0.30151605730722364, "ROUGE_L": 0.6080119244575778, "CIDEr": 1.3715624253497698, "SPICE": 0.23541176362446695}}
{"val": {"Bleu_1": 0.8255328595145602, "Bleu_2": 0.6868702611201942, "Bleu_3": 0.5519031726420236, "Bleu_4": 0.4354463971880122, "METEOR": 0.3033446173121947, "ROUGE_L": 0.6107641429094333, "CIDEr": 1.3877831392810822, "SPICE": 0.235643357884401}}
{"val": {"Bleu_1": 0.8279085623707302, "Bleu_2": 0.689315763607898, "Bleu_3": 0.5542196085565972, "Bleu_4": 0.4393303273526725, "METEOR": 0.30652090677813526, "ROUGE_L": 0.6141870544175783, "CIDEr": 1.4020532681341553, "SPICE": 0.23846326296514572}}
{"val": {"Bleu_1": 0.8318979270412391, "Bleu_2": 0.6933581693794059, "Bleu_3": 0.5580103441892669, "Bleu_4": 0.442200620617155, "METEOR": 0.3089680987367561, "ROUGE_L": 0.6158630217893815, "CIDEr": 1.418023784164105, "SPICE": 0.24063981240115964}}
{"test": {"Bleu_1": 0.8305294220120865, "Bleu_2": 0.6912970605644893, "Bleu_3": 0.5565461847006546, "Bleu_4": 0.4406486648331637, "METEOR": 0.3095835115295356, "ROUGE_L": 0.6167463265705526, "CIDEr": 1.4311418829460887, "SPICE": 0.24323871537233485}}
I was replicating the pretraining stages too, so I re-named the .sh
files to load pre-trained checkpoints. But other contents are the same.
Thank you again for sharing!
Hi,
First of all, thanks for the great work!
Issue I encountered:
I am trying to replicate the BLIP-2 paper, Table3,![Screenshot 2024-05-29 at 23 36 23](https://github.com/salesforce/LAVIS/assets/80808548/26ba42f9-b8c3-4b9f-9d00-b7ba838ea02f)
I,.e. I ran the finetuning COCO Captioning finetuning using the script:
bash LAVIS/run_scripts/blip2/train/train_caption_coco_from_scratch.sh
I fine-tuned for 5 epochs, using batch size 256; freeze_vit = False. However, fine-tune loss remains at around 1.76 and I obtained BLEU_4 = 0.158 Where the paper had BLEU@4 = 43.5How I got this
Before getting this low performance, I encountered and debugged this error in opt_model.generate():
input length of input_ids is 0, but `max_length` is set to -6. this can lead to unexpected behavior. you should consider increasing `max_length` or, better yet, setting `max_new_tokens`.
After inspection, the above bug was from this line: https://github.com/salesforce/LAVIS/blob/59273f651b9bffb193d1b12a235e909e9f826dda/lavis/models/blip2_models/blip2_opt.py#L226 where input sequence is 4, and num_query = 32. Concated, becomes 36. In the yaml, however, it set the max_len = 30. Consequently, 30-36=-6 I changed the max_len = 40, and error was gone.
My question is
However, I am suspecting this bug was the fundamental issue that led to BLEU_4 = 0.15 Otherwise, could you please point at what I should change in order to replicate Table3?
Thank you! Appreciate your reply and help