Add support for MiniCPM-V-2 and MiniCPM-Llama3-V-2_5

HwwwwwwwH commented 1 month ago

Model description

MiniCPM-V is a series of Openbmb's vision language models. We want to add support for MiniCPM-V-2 and later models

Open source status

[x] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

https://huggingface.co/openbmb/MiniCPM-V-2 https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5

amyeroberts commented 1 month ago

Hi @HwwwwwwwH, thanks for opening this model request!

The linked to models can already be used in the transformers library as their code is defined on the hub. One just needs to pass in trust_remote_code=True in the from pretrained call (if you trust the code):

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2', trust_remote_code=True, torch_dtype=torch.bfloat16)

HwwwwwwwH commented 1 month ago

Hi @HwwwwwwwH, thanks for opening this model request!

The linked to models can already be used in the transformers library as their code is defined on the hub. One just needs to pass in trust_remote_code=True in the from pretrained call (if you trust the code):
model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2', trust_remote_code=True, torch_dtype=torch.bfloat16)

Thanks for reply! But we want to add our code into transformers package of python. So I followed this doc(https://huggingface.co/docs/transformers/add_new_model) to create this issue.

amyeroberts commented 1 month ago

Hi @HwwwwwwwH,

there's no need to add into the transformers library given the remote code, it should be functionally the same for users. Is there behaviour or features for the models in the code repo which aren't working or available for the remote model?

HwwwwwwwH commented 1 month ago

Hi, there's some reasons why we want add this to official transformers code.

There's a issue about naming the version of minicpmv https://github.com/OpenBMB/MiniCPM-V/issues/41.
Some other library, e.g. vllm, will invoke some classes of LLM(VLM) models and their configs. The dynamic loading is not very convenient.
For the image_processor, we do an image slicing which could generates sub-images of different shape, and can not be batched by BatchFeature class, we add a solution to this.

HwwwwwwwH commented 1 month ago

@amyeroberts Sry to bother... So can I start to push forward this task?

amyeroberts commented 1 month ago

Hi @HwwwwwwwH,

Thanks for outlining the difficulties with the model on the hub!

There's a issue about naming the version of minicpmv https://github.com/OpenBMB/MiniCPM-V/issues/41.

This is definitely an issue, and one we need to address, thanks for flagging!

Some other library, e.g. vllm, will invoke some classes of LLM(VLM) models and their configs. The dynamic loading is not very convenient.

Could you explain this a bit more, perhaps with an example? It's not immediately obvious to me why this is hub specific

For the image_processor, we do an image slicing which could generates sub-images of different shape, and can not be batched by BatchFeature class, we add a solution to this.

This is fine - you can specify any custom code on the hub, including processing code. Just like the model, it should be possible to load a custom processor or image processor from the hub using the auto classes.

@amyeroberts Sry to bother... So can I start to push forward this task?

Anyone is of course welcome to open a PR to add a model. The bar to adding models into the repo is high, and so this process can be long. As the model is already available on the hub, it's also likely to have lower priority with regards to PR reviews. If the architecture has to be adapted to be in-line with transformers standards, then you'll also have to handle versioning of the weights.

That being said - I can see the number of downloads for openbmb/MiniCPM-Llama3-V-2_5 is pretty high, which is a good argument for inclusion in the library. I just want to flag that going from hub -> transformers is tricky and it may take some time.

Mihaiii commented 1 month ago

+1

Yes, it would be awesome if it could be included in the transformers library. I tried multiple visual models for my custom task and MiniCPM-Llama3-V-2_5 is by far the best open source VQA model so far.

amyeroberts commented 1 month ago

cc @Rocketknight1 regarding the loading of models from the hub with "." in the name, as I think you were working on this previously

huggingface / transformers