philipperemy / deep-speaker

Deep Speaker: an End-to-End Neural Speaker Embedding System.
MIT License
905 stars 241 forks source link

average embeddings #74

Closed taalua closed 3 years ago

taalua commented 3 years ago

Hi, thanks for the code. How to average embeddings? I follow this code in previous issues, but it won't sum to 1.0 after np.mean.

def embedding_average(path, model): audio_list = os.listdir(path) sum = [] for idx, audio in enumerate(audio_list): tmp = get_per_speaker_embedding(path + '/' + audio, model) sum.append(tmp) return np.mean(sum, axis=0)

philipperemy commented 3 years ago

@taalua it should not sum to 1. Embeddings are defined on a hypersphere where their L2 norm is 1.

image

For every embedding (if I refer to the example in the README), the L2 norm computed by this function should be equal to 1:

np.sqrt(np.sum(predict_001 ** 2)) # will be equal to 1.

When we train, the normalization is done here: https://github.com/philipperemy/deep-speaker/blob/cd354a761eb0b111bc29a0d2f7f0806f94e07fd5/conv_models.py#L67

If you want to average N embeddings, I guess they are many ways to do it. But you can take the mean embedding and then normalize it by the L2 norm (np.sqrt(np.sum(v**2)).

mean_embedding = np.mean([predict_001, predict_002], axis=0) # average across 2 embeddings.
mean_embedding = mean_embedding / np.sqrt(np.sum(mean_embedding ** 2)) # normalize by L2 norm.
print(np.sqrt(np.sum(mean_embedding ** 2))) # check that the L2 norm is one.

There might be a smarter way to do that.

taalua commented 3 years ago

Thanks for quick reply!