I can provide more datasets from industry.

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

BSD 3-Clause "New" or "Revised" License

9.29k stars 918 forks source link

I can provide more datasets from industry. #187

Open dongrixinyu opened 1 year ago

dongrixinyu commented 1 year ago

Language-image pretraining and finetuning is promising. I am new to this field and wanna get more info about datasets.

all these datasets are open and free for people, right?
I wanna use this model into my own tasks like detecting file disasters in the image, and the default model behaves very good.
If you want more datasets, I could provide.

I wanna finetune the model in this repo, can I get a convinent approach to these datasets?

dxli94 commented 1 year ago

Hi, @dongrixinyu

Datasets we use are subject to their own license. It is advised to check individual license of them upon use.
OpenCLIP models are subject to the license https://github.com/mlfoundations/open_clip/blob/main/LICENSE.
Other models, especially those developed by our team at Salesforce, e.g. ALBEF, BLIP, BLIP2, ALPRO, img2prompt, pnp-vqa are free for both commercial and non-commercial use.
Thanks for the offering. Can you please maybe describe your application situations a little more in detail that we are interested to know. If you are interested, maybe send me an email at li.d@salesforce.com.

dongrixinyu commented 1 year ago

Thank you for your info.

As you know, large language-image pretrained model is a promising aspect. I assume the mature multimodal model might be open to public in about 1~2 years.

So, does salesforce wanna continue to combine videos rather than images with language in the future? how do you think of your multimodal model compared to other companies?

dxli94 commented 1 year ago

BLIP-2 represents the state-of-the-art multimodal capabilities, evidenced by the evaluation as reported in the paper.

Videos are of interest and we explored it before in ALPRO, also included in LAVIS.