Has anyone considered applying this to AudioCLIP? We could search by 3 modalities

rom1504 / clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them

https://rom1504.github.io/clip-retrieval/

MIT License

2.42k stars 213 forks source link

Has anyone considered applying this to AudioCLIP? We could search by 3 modalities #140

Open voodoohop opened 2 years ago

rom1504 commented 2 years ago

we have an ongoing project at laion discord (https://discord.gg/eq3cAMZtCC) to try and make a good AudioClip and also to collect a larger text/audio dataset

once these 2 bricks are available, indeed building a semantic search system will be very fun!

voodoohop commented 2 years ago

we have an ongoing project at laion discord (https://discord.gg/eq3cAMZtCC) to try and make a good AudioClip and also to collect a larger text/audio dataset

once these 2 bricks are available, indeed building a semantic search system will be very fun!

I have just been evaluating wav2clip in combination with image generation. It embeds to the same embedding space as CLIP VIT-B/32 and seems to be working really well for me.