ppwwyyxx / speaker-recognition

A Speaker Recognition System
Apache License 2.0
675 stars 276 forks source link

There is little difference in the scores of the speaker's test results #91

Open Vickey-ZWQ opened 4 years ago

Vickey-ZWQ commented 4 years ago

Hi: I have successfully run your example without GUI.However, there is almost no difference between the result scores predicted by each speaker and those of others. Even the result scores of speaker tests outside the training model are almost the same as the result scores predicted by speakers in the training model. So I'm confused about the training model's data set and score results. Here are my questions、operation steps、 dataset instructions and my expectations. If you can provide necessary suggestions, it would be great!

My questions:

1.Do you think I have any mistakes after my operation below? 2.What are the requirements for data sets? 3.How is the score of the test result calculated, or within what score range can we determine the same person?

Operation steps:

1.enroll:

python speaker-recognition.py -t enroll -i "/XXX/XXX/Music/storytelling/shuoshu/*" -m /XXX/XXX/Music/storytelling/shuoshu/shuoshu.model.out

2.enroll-result:

enroll-result

3.predict

python speaker-recognition.py -t predic -i "/XXX/XXX/Music/storytelling/predic/*.wav" -m /XXX/XXX/Music/storytelling/shuoshu/shuoshu.model.out

4.predit-result

Screenshot from 2020-05-16 14-25-01

(Note: as you can see, there are two more lines of array in the output result, because I modified part of the code of score calculation): Screenshot from 2020-05-16 14-15-28

Dataset instruction:

The parameters collected by the following data sets are as follows: 1.Sampled at 16kHz. 2.Frame the signal into 25 ms frames. 3.Frame step is 10ms As you can see above, the dataet I used contains four speakers's wave files under the dir path "/XXX/XXX/Music/storytelling/shuoshu/*" .Each speaker's speaking time is about 50 minutes, which is divided into 50 WAV files, and the length of each file is 1 minute. The speech time of each speaker is about 50 minutes, which is divided into 50 WAV files, and the length of each file is 1 minute. Of these 50 files, 40 are for registration and 10 for prediction. In addition, I have tried to use a larger data set to train the model. The data set includes 230 speakers and is divided into 230 folders. There are 100 WAV files under each folder. The duration of each wav file is 2 seconds. Before the training, the 10wav file of each speaker is extracted for the test after the training.But the result is the same as the result above, and the score of each speaker is almost the same

My expectations:

I know that you have stopped maintenance for this project, but I still hope to get your help——to answer my questions I wrote above. Finally, I wish you a happy life.

vigorous2008 commented 3 years ago

I have similar questions , any idea ?

Ellinia511 commented 3 years ago

Hi: I have successfully run your example without GUI.However, there is almost no difference between the result scores predicted by each speaker and those of others. Even the result scores of speaker tests outside the training model are almost the same as the result scores predicted by speakers in the training model. So I'm confused about the training model's data set and score results. Here are my questions、operation steps、 dataset instructions and my expectations. If you can provide necessary suggestions, it would be great!

My questions:

1.Do you think I have any mistakes after my operation below? 2.What are the requirements for data sets? 3.How is the score of the test result calculated, or within what score range can we determine the same person?

Operation steps:

1.enroll:

python speaker-recognition.py -t enroll -i "/XXX/XXX/Music/storytelling/shuoshu/*" -m /XXX/XXX/Music/storytelling/shuoshu/shuoshu.model.out

2.enroll-result:

enroll-result

3.predict

python speaker-recognition.py -t predic -i "/XXX/XXX/Music/storytelling/predic/*.wav" -m /XXX/XXX/Music/storytelling/shuoshu/shuoshu.model.out

4.predit-result

Screenshot from 2020-05-16 14-25-01

(Note: as you can see, there are two more lines of array in the output result, because I modified part of the code of score calculation): Screenshot from 2020-05-16 14-15-28

Dataset instruction:

The parameters collected by the following data sets are as follows: 1.Sampled at 16kHz. 2.Frame the signal into 25 ms frames. 3.Frame step is 10ms As you can see above, the dataet I used contains four speakers's wave files under the dir path "/XXX/XXX/Music/storytelling/shuoshu/*" .Each speaker's speaking time is about 50 minutes, which is divided into 50 WAV files, and the length of each file is 1 minute. The speech time of each speaker is about 50 minutes, which is divided into 50 WAV files, and the length of each file is 1 minute. Of these 50 files, 40 are for registration and 10 for prediction. In addition, I have tried to use a larger data set to train the model. The data set includes 230 speakers and is divided into 230 folders. There are 100 WAV files under each folder. The duration of each wav file is 2 seconds. Before the training, the 10wav file of each speaker is extracted for the test after the training.But the result is the same as the result above, and the score of each speaker is almost the same

My expectations:

I know that you have stopped maintenance for this project, but I still hope to get your help——to answer my questions I wrote above. Finally, I wish you a happy life.

Hello, how can I contact you? What kind of data set you use?