Kosmos-2.5 implementation in transformers

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

135.11k stars 27.04k forks source link

Kosmos-2.5 implementation in transformers #30877

Open Natyren opened 6 months ago

Natyren commented 6 months ago

Model description

Hello everyone,

The Kosmos-2.5 is a multimodal literate model that can be used for tasks such as OCR and text-rich image comprehension. It includes a ViT encoder, a Resampler, and a shared decoder module. To the best of my knowledge, the architecture of this model is similar to Kosmos-2 but has some differences. Due to these differences, using this model in Transformers requires a standalone implementation.

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

Paper: https://arxiv.org/pdf/2309.11419 Code: https://github.com/microsoft/unilm/tree/master/kosmos-2.5 Authors: @Dod-o @wolfshow

amyeroberts commented 6 months ago

cc @ydshieh

Natyren commented 6 months ago

I would like to assist with this implementation. If there are any guidelines on how to do it effectively, I would like to join implementation process

amyeroberts commented 6 months ago

Hi @Natyren, there's a guide in the documentation here: https://huggingface.co/docs/transformers/add_new_model

ydshieh commented 5 months ago

Hi @Natyren, sorry for the late reply. I am thinking to talk to the model author about if they are interested in porting this model into transformers. I will come back to you here for the updates.

tic-top commented 5 months ago

Hi @ydshieh, I'm an intern at MSRA, and my mentor @Dod-o want to convert the ks25 to hf form. Is thre anything I can do for you.

ydshieh commented 5 months ago

Hi @tic-top Thank you for this message! This is a great news. It's probably better for me to add you into one of our slack channel. Let me check.

(just sent an email :-) now)

EwoutH commented 4 months ago

I would love this! What’s needed to move it forward?

ydshieh commented 4 months ago

@EwoutH We are collaborating with @tic-top to port this into transformers 🤗