Open giacomocamposampiero opened 1 year ago
Hi @giacomocamposampiero, thanks for your interest. Glad to hear you are making progress.
Your configuration looks good to me. I see two possible aspects that you might want to consider:
By the way, just curious what is the data you are using? Is it like a collection of images described by a paragraph each?
These are just my guess. Please feel welcome to discuss.
Thanks.
Thanks @dxli94 for the quick answer and your meaningful suggestions! I will try to increase the training data size/number of epochs and, if that doesn't make it, to explore different language models more suitable for longer text generation.
About the data: yes, I'm using a collection of images described by a paragraph each. The images however are quite simple (compositions of abstract geometric shapes) and the captions very structured and repetitive, hence I was hoping that my current data would have been enough to fine-tune the model.
您好,感谢您为图书馆所做的出色工作!
我目前正在尝试在自定义数据集上微调 BLIP。我按照您关于自定义数据集生成的教程进行操作,并设置了所有必要的文件进行微调,一切都按预期进行。 我遇到的唯一问题是生成的字幕的最大长度。在我的训练配置文件中,此长度设置为 256,但模型永远不会生成长度超过约 50 个单词(平均大约 90 个标记)的标题。
我已经将 BERT 嵌入的大小增加到 256,并在这一行中对其进行硬编码:
并将默认 max_lengths 更改为 256: https://github.com/salesforce/LAVIS/blob/6c6c981b8ea5a64ee9e706cf003559f7d8be085e/lavis/models/blip_models/blip_caption.py#L214
我的训练配置文件如下所示
model: arch: blip_caption model_type: base_coco load_finetuned: False datasets: custom_caption: # name of the dataset builder vis_processor: train: name: "blip_image_train" eval: name: "blip_image_eval" text_processor: train: name: "blip_caption" prompt: "a picture of " eval: name: "blip_caption" run: task: captioning # optimizer lr_sched: "linear_warmup_cosine_lr" init_lr: 1e-5 min_lr: 0 weight_decay: 0.05 max_epoch: 20 batch_size_train: 2 batch_size_eval: 8 num_workers: 1 max_len: 256 min_len: 5 num_beams: 3 seed: 42 output_dir: "output/BLIP/Caption_custom" amp: False resume_ckpt_path: null evaluate: False train_splits: ["train"] valid_splits: ["val"] test_splits: ["test"] device: "cuda" world_size: 1 dist_url: "env://" distributed: True
我正在用 5000 个样本训练模型。您对我的微调配置中可能存在的错误或缺失有什么建议吗?我应该为优化器使用不同的参数吗?使用 BLIP 是否可以生成这么长的字幕?
谢谢!
Hi, brother! May I ask if you also do image captions? I also want to use blip2 to generate image caption for my dataset. Have you implemented it? What is the quality of his captions for image generation? Do you need to make minor adjustments?
Hello, it did not work in the end for me because I was not able to generate labels longer than 50 words.
Hello, it did not work in the end for me because I was not able to generate labels longer than 50 words.
If you want to generate longer sentences, you can try using the llava model. I have tried using the text generated by his demo
Hi, thanks for the amazing work you did with the library!
I am currently trying to fine-tune BLIP on a custom dataset. I followed your tutorial on the custom dataset generation and set up all the necessary files for the fine-tuning, and everything works as expected. The only problem I've encountered is with the maximum length of the generated captions. In my training configuration file this length is set to 256, but the model never generates captions which are longer than ~50 words (roughly 90 tokens in average).
I have already increased BERT embedding's size to 256, hard-coding it in this line: https://github.com/salesforce/LAVIS/blob/6c6c981b8ea5a64ee9e706cf003559f7d8be085e/lavis/models/blip_models/blip_caption.py#L51 and changed the default max_lengths to 256 here: https://github.com/salesforce/LAVIS/blob/6c6c981b8ea5a64ee9e706cf003559f7d8be085e/lavis/models/blip_models/blip_caption.py#L214 and here https://github.com/salesforce/LAVIS/blob/6c6c981b8ea5a64ee9e706cf003559f7d8be085e/lavis/models/blip_models/blip_caption.py#L141
My training config file looks like this
I am training the model with 5000 samples. Do you have any suggestion on what could be wrong or missing in my fine tuning configuration? Should I use different parameters for the optimiser? Is generating captions of this length even achievable with BLIP?
Thanks!