Open Natyren opened 6 months ago
cc @ydshieh
I would like to assist with this implementation. If there are any guidelines on how to do it effectively, I would like to join implementation process
Hi @Natyren, there's a guide in the documentation here: https://huggingface.co/docs/transformers/add_new_model
Hi @Natyren, sorry for the late reply. I am thinking to talk to the model author about if they are interested in porting this model into transformers
. I will come back to you here for the updates.
Hi @ydshieh, I'm an intern at MSRA, and my mentor @Dod-o want to convert the ks25 to hf form. Is thre anything I can do for you.
Hi @tic-top Thank you for this message! This is a great news. It's probably better for me to add you into one of our slack channel. Let me check.
(just sent an email :-) now)
I would love this! What’s needed to move it forward?
@EwoutH We are collaborating with @tic-top to port this into transformers
🤗
Model description
Hello everyone,
The Kosmos-2.5 is a multimodal literate model that can be used for tasks such as OCR and text-rich image comprehension. It includes a ViT encoder, a Resampler, and a shared decoder module. To the best of my knowledge, the architecture of this model is similar to Kosmos-2 but has some differences. Due to these differences, using this model in Transformers requires a standalone implementation.
Open source status
Provide useful links for the implementation
Paper: https://arxiv.org/pdf/2309.11419 Code: https://github.com/microsoft/unilm/tree/master/kosmos-2.5 Authors: @Dod-o @wolfshow