Open hbwu-ntu opened 3 months ago
Hi, @jishengpeng thank you for the amazing work. May I ask several questions:
- What are the results for large and medium models? Currently, there are only small-model results in the paper
- Do you have some ablation study to show the performance gain by incorporating the attention block?
- Do you have an ablation study to show the performance gain by changing the decoder similar to VOCOS?
- Will you compare your codec model with Single-codec or Ti-Codec? It's hard to compare with Single-codec as it is not open-source. But Ti-codec is open-source. Will you include it in the comparison?
- Do you consider the human evaluation, as the current trends between UTMOS and PESQ (STOI) are not consistent? UTMOS is somehow a proxy for human listening, just like DNSMOS. But they are not accurate enough. PESQ and STOI are also good proxies for human listening.
Thank you very much for your interest!
Best regards.
Thank you for the answer! Glad to see more numbers in the upcoming Arxiv version.
Hi, @jishengpeng thank you for the amazing work. May I ask several questions: