microsoft / CLAP

Learning audio concepts from natural language supervision
MIT License
434 stars 35 forks source link

audio embeddings non deterministic #37

Open ksasso1028 opened 4 weeks ago

ksasso1028 commented 4 weeks ago

I have tried disabling all dropout layers, but for some reason my audio embeddings are non deterministic for the same inputs. The text embeddings do not exhibit this behavior. Differences can be large and change the clap score as a result.

run 1 tensor([[ 1.2297, 0.1761, 1.1028, ..., 1.6727, 1.7659, -0.4553]]) (TEXT) tensor([[ 1.0847, -0.2657, 0.8899, ..., 1.5744, 1.5822, -1.5239]]) (AUDIO)

raw similarity score = 20.1413

run 2 tensor([[ 1.2297, 0.1761, 1.1028, ..., 1.6727, 1.7659, -0.4553]]) (TEXT) tensor([[ 0.9259, -0.3579, 1.0072, ..., 1.1707, 1.3076, -1.4479]]) (AUDIO)

raw similarity score = 21.4179

soham97 commented 2 weeks ago

Hi @ksasso1028, please refer to this issue: https://github.com/microsoft/CLAP/issues/24