invictus717 / MetaTransformer

Meta-Transformer for Unified Multimodal Learning
https://arxiv.org/abs/2307.10802
Apache License 2.0
1.52k stars 113 forks source link

how to export to Onnx model for faster inference #30

Closed eisneim closed 1 year ago

eisneim commented 1 year ago

this is a very useful project, would be great if it can be used in production with onnx support

invictus717 commented 1 year ago

Thank you very much for your constructive suggestions! Could you please specify which part we need to provide onnx support first? Should it be the inference part of our multimodal encoder?

eisneim commented 1 year ago

Thank you! 👍 shared encoder for Text, Image, Audio, Video is the most useful one, those embedings can be used for LLM app that does audio, video retrieval; searching task would be: image to audio search, image to video clip search, text to audio search, etc.