microsoft / CLAP

Learning audio concepts from natural language supervision
MIT License
486 stars 38 forks source link

audio embeddings non deterministic #37

Closed ksasso1028 closed 2 months ago

ksasso1028 commented 4 months ago

I have tried disabling all dropout layers, but for some reason my audio embeddings are non deterministic for the same inputs. The text embeddings do not exhibit this behavior. Differences can be large and change the clap score as a result.

run 1 tensor([[ 1.2297, 0.1761, 1.1028, ..., 1.6727, 1.7659, -0.4553]]) (TEXT) tensor([[ 1.0847, -0.2657, 0.8899, ..., 1.5744, 1.5822, -1.5239]]) (AUDIO)

raw similarity score = 20.1413

run 2 tensor([[ 1.2297, 0.1761, 1.1028, ..., 1.6727, 1.7659, -0.4553]]) (TEXT) tensor([[ 0.9259, -0.3579, 1.0072, ..., 1.1707, 1.3076, -1.4479]]) (AUDIO)

raw similarity score = 21.4179

soham97 commented 4 months ago

Hi @ksasso1028, please refer to this issue: https://github.com/microsoft/CLAP/issues/24