Try to reproduce but some issues occur

yizhidamiaomiao commented 2 years ago

I run the command "./train.sh 0 se model_se"

The issue is """"""""""""""""""""""""""""""""" Preprocessing: 0%| | 0/11572 [00:00<?, ?it/s] Traceback (most recent call last): File "src/cdiffuse/preprocess.py", line 140, in main(parser.parse_args()) File "src/cdiffuse/preprocess.py", line 120, in main list(tqdm(executor.map(spec_transform, filenames, repeat(args.dir), repeat(args.outdir)), desc='Preprocessing', total=len(filenames))) File "/home/tiger/.local/lib/python3.7/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/usr/lib/python3.7/concurrent/futures/process.py", line 476, in _chain_from_iterable_of_lists for element in iterable: File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in result_iterator yield fs.pop().result() File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result return self.get_result() File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in get_result raise self._exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending. """"""""""""""""""""""""""""""""" How to solve this?

Although "se_pre" mode can run, with the dataset provided by your link, I MUST change the sample_rate to 48000 in params.py, otherwise this code will throw a wrong information. Does this correct for the reproduce?

Also, I run for 12 hours with 4 GPU at step 156600 for "se_pre" mode. How long(how much epoch) do we need to train your model?

neillu23 commented 2 years ago

Hi @yizhidamiaomiao, thanks for sharing your experience! I've replaced torchaudio.load_wav() with the torchaudio.load() function in the new commit. This may fix some errors, as torchaudio.load_wav has been removed in newer versions of torchaudio. For the second question, can you share a link to the data, and is the sample rate of the data you are using 48000? Also, the "se_pre" step is no longer needed, as the randomly initialized CDiffuSE performs as well as the one initialized from pre-trained parameters. The model with step 507600 (no pre-training) in my experiments slightly exceeded our paper's results. Please try the new code and let me know if you have any further questions!

yizhidamiaomiao commented 2 years ago

Hi, thank you for your response!

By your instructions, I do not train the "se_pre" now. I tried to directly train your model by the command: "./train.sh 0 se model_se" and evaluate at step 600075 by the command "./inference.sh 0 600075 se model_se".

However, the training result seems different from your folder 'Sample Files'. Here is the link of the generated speech by the trained model 'weights-600075.pt' https://drive.google.com/drive/folders/1aK0zzC1wDToWIAoEq2dNSsdwSo9n9rWd?usp=sharing, which may not competitive with the SOTA model. Could you please help us find out what should we do in order to reproduce your result in 'Sample Files' ?

neillu23 commented 2 years ago

Hi @yizhidamiaomiao, thanks for sharing the audio file! The command you are using seems to be from a previous commit. I've updated the command style and torchaudio functions in this commit: https://github.com/neillu23/CDiffuSE/commit/7e13e6e44294eb0d020dc63108aeb24bb39b29e0 The new command would be ". /train.sh 0 model_se" and ". /inference.sh 0 model_se 600075". Here are the results I got from my trained model 'weights-54000.pt' https://drive.google.com/drive/folders/1EIh-ZwokHcRacv20Umk9MMkld9ETdBSQ?usp=sharing. The environment I used was torchaudio 0.9.0/ pytorch 1.9.0. If this doesn't work for you, please let me know; thanks again!

yizhidamiaomiao commented 2 years ago

7e13e6e

Thanks for your response!

We download your newest code, and trained by the command ". /train.sh 0 model_se" and inferenced by command "./inference.sh 0 model_se 108000 ". The trained model is 'weights-108000.pt'. The newest results we get are in the link https://drive.google.com/drive/folders/1aK0zzC1wDToWIAoEq2dNSsdwSo9n9rWd?usp=sharing with file named as "*_enhanced_ver 7e13e6e.wav". It seems that there still be some noise in those enhanced speech.

Shall we wait for step 507600?

The environment I used is torchaudio '0.10.0+cu113'/ pytorch 1.10.0.

Wait for any further guidance and thanks for your patient!

neillu23 commented 2 years ago

Thank you for reporting the following results!

I think a possible reason could be the difference between our training data. You mentioned the data you used with a 48000 sampling rate but the data I used are with a 16000 sample rate. Could you share your training data and model with me so I can try if your data/model works in my environment?

Thank you again, and sorry for the inconvenience!

yizhidamiaomiao commented 2 years ago

Thank you for reporting the following results!

I think a possible reason could be the difference between our training data. You mentioned the data you used with a 48000 sampling rate but the data I used are with a 16000 sample rate. Could you share your training data and model with me so I can try if your data/model works in my environment?

Thank you again, and sorry for the inconvenience!

I use the data directly from your link "https://datashare.ed.ac.uk/handle/10283/2791" given in the sentence "The default dataset is VOICEBANK-DEMAND dataset. You can download them from VOICEBANK-DEMAND)" in the README.md file. Actually the audio downloaded in the given website are 48k audio, and I need to write a torchaudio.resample(48k, 16k) in the function "transform" in your preprocess file to train the code.

neillu23 commented 2 years ago

The data I'm using is already at a 16k sample rate, which may be different from the one in the link. Could you try adding a torchaudio.resample(48k, 16k) for both "signal" and "noisysignal" in the __getitem_\ function here in NumpyDataset? If this works, I will change the description in the README. Sorry again about this issue.

neillu23 / CDiffuSE

Try to reproduce but some issues occur #3