Open ootsuka-repos opened 2 weeks ago
I am currently training a model for Japanese language processing. To improve audio quality, I have raised the sampling rate to 48,000. Next, when I compared the YAML configurations for the 40s and 75s models, I found that the differences were in the downsamples, n_fft, and hop_length parameters.
I would like to ask: Is increasing the number of tokens the most straightforward way to enhance the quality of the generated audio? Also, to raise the sampling rate to 48,000, are there any other parameters, aside from the sampling rate itself, that I need to adjust?
The following is the code I am using. train.py
Thank you for your response to my question. I’m also excited to hear that a subsequent version will be released! When is the release of the next version expected?
I am currently training a model for Japanese language processing. To improve audio quality, I have raised the sampling rate to 48,000. Next, when I compared the YAML configurations for the 40s and 75s models, I found that the differences were in the downsamples, n_fft, and hop_length parameters.
I would like to ask: Is increasing the number of tokens the most straightforward way to enhance the quality of the generated audio? Also, to raise the sampling rate to 48,000, are there any other parameters, aside from the sampling rate itself, that I need to adjust?
The following is the code I am using. train.py
EncodecFeatures