audio embeddings non deterministic

I have tried disabling all dropout layers, but for some reason my audio embeddings are non deterministic for the same inputs. The text embeddings do not exhibit this behavior. Differences can be large and change the clap score as a result.

run 1 tensor([[ 1.2297, 0.1761, 1.1028, ..., 1.6727, 1.7659, -0.4553]]) (TEXT) tensor([[ 1.0847, -0.2657, 0.8899, ..., 1.5744, 1.5822, -1.5239]]) (AUDIO)

raw similarity score = 20.1413

run 2 tensor([[ 1.2297, 0.1761, 1.1028, ..., 1.6727, 1.7659, -0.4553]]) (TEXT) tensor([[ 0.9259, -0.3579, 1.0072, ..., 1.1707, 1.3076, -1.4479]]) (AUDIO)

raw similarity score = 21.4179

microsoft / CLAP

audio embeddings non deterministic #37