huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.05k stars 26.8k forks source link

Add pipeline for cross-modal / uni-modal ranking #17274

Closed ggoggam closed 2 years ago

ggoggam commented 2 years ago

Feature request

Given queries and keys, the proposed pipeline returns a ranked list of keys that are most similar to each respective query. This pipeline should support uni-modal and cross-modal retrieval, i.e.

Prominent use cases would be:

Motivation

I was looking for a use-case for CLIP for cross-modal retrieval, but the current pipeline for CLIP does not seem to support cross-modal retrieval. I believe there is a demand for this pipeline.

Your contribution

sijunhe commented 2 years ago

I think there are some image-text retrieval capabilities already, such as ViltForImageAndTextRetrieval. But these can only work for a toy example set of queries and keys due to its interaction-based nature. A true retrieval (cross-model or not) would probably need datasets and faiss. I think that may be too complicated for for a pipeline?

ggoggam commented 2 years ago
github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.