quickvc / QuickVC-VoiceConversion

QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
MIT License
227 stars 26 forks source link

use_sr isn't used in the actual code #15

Closed skol101 closed 1 year ago

skol101 commented 1 year ago

Looks like, unlike FreeVC https://github.com/OlaWod/FreeVC/blob/81c169cdbfc97ff07ee2f501e9b88d543fc46126/data_utils.py#L72C9-L72C9 your code doesn't explicitly use this param, in fact it doens't use SR at all?

https://github.com/quickvc/QuickVC-VoiceConversion/blob/277118de9c81d1689e16be8a43408eda4223553d/data_utils_new_new.py#L70

quickvc commented 1 year ago

I'm sorry for the inconvenience, this is a bug in the uploaded code version, in order to use sr, just change that line to i = random.randint(68,92) c_filename = filename.replace(".wav", f"_{i}.npy")

I just changed the readme, thank you for your reminder.

skol101 commented 1 year ago

https://github.com/OlaWod/FreeVC/blob/main/preprocess_sr.py creates .pt files (not .npy).

quickvc commented 1 year ago

This because the difference between the Hubert-soft and the wavlm. The output of Hubert-soft is npy, but wavlm is pt

skol101 commented 1 year ago

I see, so that means preprocess_sr must be also updated to Hubert-soft and re-run for the dataset.

quickvc commented 1 year ago

Yes

skol101 commented 1 year ago

Training time goes significantly up with SR augmentation...which is probably expected.

quickvc commented 1 year ago

Actually, the most terrible part of SR augmentation is the data storage space... actually quickvc is already four months old research. Now I found a better content feature compared to Hubert-soft, the new content feature doesn't need augmentation and can achieve cross-lingual voice conversion. If you are interested, welcome to see consistencyvc