Closed ksasso1028 closed 2 months ago
I have tried disabling all dropout layers, but for some reason my audio embeddings are non deterministic for the same inputs. The text embeddings do not exhibit this behavior. Differences can be large and change the clap score as a result.
run 1 tensor([[ 1.2297, 0.1761, 1.1028, ..., 1.6727, 1.7659, -0.4553]]) (TEXT) tensor([[ 1.0847, -0.2657, 0.8899, ..., 1.5744, 1.5822, -1.5239]]) (AUDIO)
raw similarity score = 20.1413
run 2 tensor([[ 1.2297, 0.1761, 1.1028, ..., 1.6727, 1.7659, -0.4553]]) (TEXT) tensor([[ 0.9259, -0.3579, 1.0072, ..., 1.1707, 1.3076, -1.4479]]) (AUDIO)
raw similarity score = 21.4179
Hi @ksasso1028, please refer to this issue: https://github.com/microsoft/CLAP/issues/24
I have tried disabling all dropout layers, but for some reason my audio embeddings are non deterministic for the same inputs. The text embeddings do not exhibit this behavior. Differences can be large and change the clap score as a result.
run 1 tensor([[ 1.2297, 0.1761, 1.1028, ..., 1.6727, 1.7659, -0.4553]]) (TEXT) tensor([[ 1.0847, -0.2657, 0.8899, ..., 1.5744, 1.5822, -1.5239]]) (AUDIO)
raw similarity score = 20.1413
run 2 tensor([[ 1.2297, 0.1761, 1.1028, ..., 1.6727, 1.7659, -0.4553]]) (TEXT) tensor([[ 0.9259, -0.3579, 1.0072, ..., 1.1707, 1.3076, -1.4479]]) (AUDIO)
raw similarity score = 21.4179