Closed ooobsidian closed 4 years ago
Yes, it is possible. Actually, uploaded files for enrollment and test are all not in the training dataset. Uploaded wav files are all clean data, so the performance is quite good. If you want to test performance in more challenging conditions (more noisy or shorter utterance,...), you have to increase the amount of training data and model size. More advanced loss function or pooling method (attentive pooling...) also can be used.
Thank you for your reply. I don't know how many speakers can ResNet-18 distinguish. Shall I change to a larger model? My training data has 855 speakers, so what do you suggest?
I think ResNet-34 is good for your condition. You can also make the model wider (increase the number of channels). The best way is to perform experiments with all of them If it is possible. In configure.py, NUM_WIN_SIZE (number of input frames) is set to 100. Increase this 200 or 300. As the training set in this tutorial is very small, I set all the settings according to the small dataset.
Thank you very much for your help!!
Hi @jymsuper , I use .npy as feature file, and I change line12 in SR_Dataset.py follow #3 , but I have some troubles when I run train.py.
Traceback (most recent call last):
File "train.py", line 328, in <module>
main()
File "train.py", line 135, in main
epoch, n_classes)
File "train.py", line 175, in train
for batch_idx, (data) in enumerate(train_loader):
File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 385, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/root/miniconda3/envs/3.6.7/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/source/speaker_recognition_pytorch/SR_Dataset.py", line 221, in __getitem__
feature, label = self.loader(feat_path)
File "/data/source/speaker_recognition_pytorch/SR_Dataset.py", line 16, in read_MFB
feature = feat_and_label['feat'] # size : (n_frames, dim=40)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
It occured at
And I sent you an email, please check it, thanks.
@jymsuper I have changed above problems, I changed feature method in the process of serialization. But now, in train.py
transform = transforms.Compose([
TruncatedInputfromMFB(), # numpy array:(LICENSE, n_frames, n_dims)
ToTensorInput() # torch tensor:(LICENSE, n_dims, n_frames)
])
An error has occurred in method ToTensorInput()
:
File "/Users/obsidian/source/voiceprint_pytorch/SR_Dataset.py", line 127, in __call__
(0, 2, 1))).float() # output type => torch.FloatTensor, fast
ValueError: axes don't match array
Could you help me solve this problem? I have debug long time ☹️
You have to change the function read_MFB according to your situation. From line 12 to line 16, we load feature (it is assumed the feature is saved using pickle) and label. Feature size should be (n_frames, dim) as written in the comment. Label should be the speaker identity in string.
You can remove from line 20 to 24 because it is assumed that the front and back of the utterance is silence.
@jymsuper I want to know it can be verified (not be identified) on the open set? That is to say, the test speakers not in training dataset. If possible, I want to know performance.