illuin-tech / colpali

The code used to train and run inference with the ColPali architecture.
https://huggingface.co/vidore
MIT License
851 stars 75 forks source link

Add `AutoModel` and `AutoProcessor` to libary #91

Closed michaelfeil closed 4 days ago

michaelfeil commented 1 week ago

Pretty self-explaining. Currently blocking the adoption into e.g. https://github.com/michaelfeil/infinity

ManuelFay commented 1 week ago

Would you be okay with the trust_remote_code=True flag ?

We can make it into an AutoModel but kinda only makes sense if you want to use it directly with transformers (which is probably best for inference purposes)

michaelfeil commented 1 week ago

I meant from colipali_engine import AutoModel; AutoModel.from_pretrained(vidore/colqwen2-v0.1) etc.

Not sure if remote code would be a good idea, I am more looking towards a more consistent abstraction that can be patched.

ManuelFay commented 1 week ago

yeah, monkey patching HF is probably not the way to go if we want to do this cleanly I think !

try this (and make sure peft is installed)

from transformers import AutoModel, AutoProcessor

model = AutoModel.from_pretrained("manu/colqwen2-v0.1-hf", 
                                  torch_dtype=torch.bfloat16,
                                  device_map="cuda:0",
                                  trust_remote_code=True)

processor = AutoProcessor.from_pretrained("manu/colqwen2-v0.1-hf", trust_remote_code=True)

I'll merge it in at some point on the vidore org if you feel this is a nice to have