if args.prefix == "aishell":
# NOTE: the loudness of aishell audio files is around -33
# The best way is datamodule --on-the-fly-feats --enable-audio-aug
cut_set = cut_set.normalize_loudness(
target=-20.0, affix_id=True
)
But model's accuracy drops a lot. I have not figure it out.
I used cut_set.normalize_loudness because the loudness of aishell audio files is small, https://github.com/lifeiteng/vall-e/blob/main/valle/bin/tokenizer.py#L173
But model's accuracy drops a lot. I have not figure it out.
Ref: