huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.95k stars 26.53k forks source link

[New model] ImageBind: One Embedding Space To Bind Them All #23240

Open xenova opened 1 year ago

xenova commented 1 year ago

Model description

As stated in their blog post,

"[ImageBind is] the first AI model capable of binding information from six modalities. The model learns a single embedding, or shared representation space, not just for text, image/video, and audio, but also for sensors that record depth (3D), thermal (infrared radiation), and inertial measurement units (IMU), which calculate motion and position."

Open source status

Provide useful links for the implementation

GitHub repo: https://github.com/facebookresearch/ImageBind Paper: https://facebookresearch.github.io/ImageBind/paper Blog: https://ai.facebook.com/blog/imagebind-six-modalities-binding-ai/ Demo: https://imagebind.metademolab.com/ Video: https://dl.fbaipublicfiles.com/imagebind/imagebind_video.mp4 Weights: https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth (currently only 1 that I can see)

shehanmunasinghe commented 1 year ago

Hi @xenova , I would like to work on implementing this model.

xenova commented 1 year ago

Hi @xenova , I would like to work on implementing this model.

Sweet!

dg845 commented 1 year ago

Hi, since it looks like the PR for this model (#23284) has been closed, I would be interested in working on a new PR to implement the ImageBind model :)

dg845 commented 1 year ago

I have opened a new PR to implement the ImageBind model: #26310.