philipperemy / deep-speaker

Deep Speaker: an End-to-End Neural Speaker Embedding System.
MIT License
905 stars 241 forks source link

Seed for deep-speaker #64

Closed bhrt-sharma closed 4 years ago

philipperemy commented 4 years ago

@bhrt-sharma can you post the script that you use? It will help me recreate your issue. Thanks!

bhrt-sharma commented 4 years ago

Thanks a lot for a speedy reply, i am pasting two links--> edited conv_models.py https://ideone.com/fTqI5h and my Flask code via which I am trying to authenticate two voices https://ideone.com/pgn0wl .... so if I test two exactly same audios for the first time, it gives me 1.00 cosine distance, and as soon as I re-request the flask app again with the same audio the cosine distance becomes 0.68 something.

philipperemy commented 4 years ago

So just I don't know about your code in particular, but when I run the example in the README I get the same result twice.

SAME SPEAKER [0.8112024]
DIFF SPEAKER [0.02534033]

SAME SPEAKER [0.8112024]
DIFF SPEAKER [0.02534033]
import numpy as np
import random
from audio import read_mfcc
from batcher import sample_from_mfcc
from constants import SAMPLE_RATE, NUM_FRAMES
from conv_models import DeepSpeakerModel
from test import batch_cosine_similarity

np.random.seed(123)
random.seed(123)

model = DeepSpeakerModel()
model.m.load_weights('ResCNN_triplet_training_checkpoint_265.h5', by_name=True)

mfcc_001 = sample_from_mfcc(read_mfcc('samples/PhilippeRemy/PhilippeRemy_001.wav', SAMPLE_RATE), NUM_FRAMES)
mfcc_002 = sample_from_mfcc(read_mfcc('samples/PhilippeRemy/PhilippeRemy_002.wav', SAMPLE_RATE), NUM_FRAMES)

predict_001 = model.m.predict(np.expand_dims(mfcc_001, axis=0))
predict_002 = model.m.predict(np.expand_dims(mfcc_002, axis=0))

mfcc_003 = sample_from_mfcc(read_mfcc('samples/1255-90413-0001.flac', SAMPLE_RATE), NUM_FRAMES)
predict_003 = model.m.predict(np.expand_dims(mfcc_003, axis=0))

print('SAME SPEAKER', batch_cosine_similarity(predict_001, predict_002))
print('DIFF SPEAKER', batch_cosine_similarity(predict_001, predict_003))
philipperemy commented 4 years ago

@bhrt-sharma so let me know! In your example, I would say, set the seed just before the sample_from_mfcc functions. You only set the seed at the beginning and because it's a server, the results will be different.

philipperemy commented 4 years ago

I've updated the README https://github.com/philipperemy/deep-speaker/commit/cc1a90baca155da824603f59bb3d54cef86f2152