Closed psp0001060 closed 11 months ago
Hi, I need some more info, usually this happens when the sampling rates in the conf file don't match the actual data. Did you change the sample rates in your experiment conf file to be 12000 (for LR) and 48000 (for HR)?
By the way, usually the speakers removed are p280 and p315 - as there were technical issues with their recording as mentioned here: "(However, two speakers, p280 and p315 had technical issues of the audio recordings using MKH 800)."
Thank you for your reply. Since there is no 12-48 conf file provided in the GitHub repository, I created a 12-48 conf file based on the 4-16 conf file and modified “lr_sr: 12000, hr_sr: 48000, nfft: 512, hop_length: 256”. I have uploaded the aero_12-48_512_256.yaml file as an attachment(Since GitHub does not allow uploading files with the .yaml extension, I added the .txt extension.). aero_12-48_512_256.yaml.txt Also, thank you for reminding me that usually the speakers removed are p280 and p315. Since create_meta_files.py defines
TOTAL_N_SPEAKERS=108
TRAIN_N_SPEAKERS=100
TEST_N_SPEAKERS=8
, there are a total of 108 speakers, so I kept the data for p280 . Is it correct to delete p280 and p315 and keep s5?
After testing, I found that the number of segments produced by the following four files is different, so I deleted these files and it can run normally and pass. p244_153_mic1.wav p250_393_mic1.wav p254_320_mic1.wav p263_258_mic1.wav
By the way, I deleted p280 and p315 and keep s5
Glad it worked out!
Hi Authors, When I run train.py, it prompts an error as below.
prompt error :
assert len(self.hr_set) == len(self.lr_set)
,then I print the length of both :len(self.lr_set): 64349 len(self.hr_set): 64353 I read the code about the length of segment,The logic is to segment the audio based on different sampling rates.
I am using the same dataset as the author, which is VCTK (excluding p315 and s5). can you tell me How should this situation be handled?