yangdongchao / AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research
574 stars 80 forks source link

Fix mel filterbank warning and update alpha calculation #47

Closed ywk991112 closed 9 months ago

ywk991112 commented 10 months ago

This pull request addresses two key issues in the reconstruction_loss function of the AcademiCodec project:

Mel Filterbank Warning: A warning was raised stating that at least one mel filterbank had all zero values, suggesting that the value for n_mels might be set too high, or the value for n_freqs too low. This issue is potentially due to the inappropriate n_fft value. To resolve this, the n_fft parameter in the MelSpectrogram configuration has been modified to be the maximum of 512 and the scale factor s. This change ensures more robust filterbank generation across different scales.

Alpha Calculation in Loss Function: The original implementation did not explicitly include the calculation of α_s = sqrt(s/2) as mentioned in the corresponding paper. This pull request introduces this calculation and incorporates alpha into the loss computation. This adjustment ensures that the loss calculation aligns more closely with the theoretical foundation laid out in the paper.

These modifications aim to enhance the accuracy and reliability of the loss computation.