jymsuper / SpeakerRecognition_tutorial

Simple d-vector based Speaker Recognition (verification and identification) using Pytorch
MIT License
210 stars 46 forks source link

run train.py error #3

Closed ooobsidian closed 4 years ago

ooobsidian commented 4 years ago

Before training, I modified SR_Dataset.py line 206 train_DB = read_DB_structure(c.TRAIN_WAV_DIR) , and I delete line 20 in DB_wav_reader.py follow issue #2, but when I run train.py, an error has occurred.

Traceback (most recent call last):
  File "train.py", line 328, in <module>
    main()
  File "train.py", line 92, in main
    train_dataset, valid_dataset, n_classes = load_dataset(val_ratio)
  File "train.py", line 22, in load_dataset
    train_DB, valid_DB = split_train_dev(c.TRAIN_WAV_DIR, val_ratio)
  File "train.py", line 65, in split_train_dev
    (train_len / total_len) * 100))
ZeroDivisionError: division by zero

I don't know how to fix it. Can you give me some ways to prepare the dataset? I use another dataset.

Thank you. @jymsuper

jymsuper commented 4 years ago

You have to modify the line 21 in train.py, which is train_DB, valid_DB = split_train_dev(c.TRAIN_FEAT_DIR, val_ratio)

c.TRAIN_FEAT_DIR in configure.py should be the path of your dataset. You can see that the original setting is TRAIN_FEAT_DIR = 'feat_logfbank_nfilt40/train'

You can check the structure of TRAIN_FEAT_DIR as I uploaded all the features in this github. => feat_logfbank_nfilt40/train/speaker_folders/feature_files.p Your dataset also has to follow this structure.

I assumed that all the features are extracted in '.p' format. If you want to change the extension, please change line 31 in DB_wav_reader.py. pattern='*/.p' should be changed according to your feature format.

If you don't extract features yet, please do that using python_speech_features library (it is explained in the README.md). I didn't upload the code for feature extraction. Of course, you can use other libraries.

I hope my answer will be enough for you.

ooobsidian commented 4 years ago

thank you very much! I noticed you didn't upload the code for feature extraction. By the way, can different feature file format be used? @jymsuper

jymsuper commented 4 years ago

Oh, I missed one thing. You need to change line12 in SR_Dataset.py It is assumed that the feature file format is pickle. You need to change the code according to the format.

ooobsidian commented 4 years ago

I got it.