Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts"
Nice Work! Would you please tell us how to use the concatenated feature (after several MOE Transformer blocks) to train models by multi-modal contrastive learning? how to use the contrastive loss to achieve modality alignment and retrieval? Where are the auxiliary losses?
Upvote & Fund
We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.
Nice Work! Would you please tell us how to use the concatenated feature (after several MOE Transformer blocks) to train models by multi-modal contrastive learning? how to use the contrastive loss to achieve modality alignment and retrieval? Where are the auxiliary losses?
Upvote & Fund