Open Dev-Khant opened 6 months ago
@NielsRogge Can I work on this to add it to the library?
Sure! Free free to open a PR and let us know when it's ready for review or you need help integrating into the library.
In general, we prioritise reviewing based on PRs opened rather than comments on issues, as we find this prevents issues from becoming stale. You're free to work on something if there's no active linked PRs open.
Thanks @amyeroberts I'll start working on this as I don't see any open PR regarding this.
Waiting for this https://github.com/huggingface/transformers/pull/29667 to get merged, because internally model uses InternLM.
Model description
A new large language and vision model (LLVM) that uses auxiliary visual information and natural language for prediction.
It uses 2 modules: ππ€πΌπ-πΎπ€π’π₯π§ππ¨π¨π€π§ and ππ€πΌπ-πππππ§. Here ππΌπΊπ½πΏπ²πππΌπΏ condenses the verbalized outputs of the external CV models into auxiliary visual information and π πΆπ π²πΏ blends three types of intelligence β visual features, auxiliary features from external CV models and language features into a cohesive whole.
MoAI-7B surpasses both open-source and closed-source LLVMs in vision language tasks.
Model repo: https://github.com/ByungKwanLee/MoAI
Open source status
Provide useful links for the implementation
No response