wenet-e2e / wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit
Apache License 2.0
356 stars 56 forks source link

[vocos] Update config.json #187

Closed Shengqiang-Li closed 5 months ago

Shengqiang-Li commented 5 months ago

Vocos was highly sensitive to the frequency band range of input features. When using 100-dimensional full-band log-mel-spectrograms as input, Vocos exhibited a significant improvement. vocos

Jackiexiao commented 5 months ago

reference: https://arxiv.org/pdf/2311.11545.pdf APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra