I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights?

salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BSD 3-Clause "New" or "Revised" License

4.84k stars 646 forks source link

I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights? #205

Open shams2023 opened 8 months ago

shams2023 commented 8 months ago

I want to use the existing image-text pedestrian dataset and finetune the BLIP model. Should I use pre-trained checkpoints weights or finetuned checkpoints weights? The generated text should be as detailed as possible, with a length of 40-60! Looking forward to your answer! Thank you!