tensorflow / similarity

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
Apache License 2.0
1.01k stars 104 forks source link

How to do cross-modal retrieval? #223

Open carlthome opened 2 years ago

carlthome commented 2 years ago

I'm curious about how to do cross-modal retrieval with the YouTube-8M dataset. I have videos with image and audio data, and would like to learn two encoders that embed both audio and RGB data into the same space, such that nearest neighbor lookups could be performed with audio embeddings to find related images, and vice versa.

Is there an easy way to extend the loss functions required by SimilarityModel to support two input heads?

Dataset signature: (features, labels) = ({'rgb': ..., 'audio': ...}, {'video_id': ...})

owenvallis commented 2 years ago

This would similar to the CLIP model. We are looking to add an example notebook for this at some point.

Layhan commented 1 year ago

Hi i was wondering if you did 'Import CLIP'?