AISHELL1 with cut_set.normalize_loudness

I used cut_set.normalize_loudness because the loudness of aishell audio files is small, https://github.com/lifeiteng/vall-e/blob/main/valle/bin/tokenizer.py#L173

                if args.prefix == "aishell":
                    # NOTE: the loudness of aishell audio files is around -33
                    # The best way is datamodule --on-the-fly-feats --enable-audio-aug
                    cut_set = cut_set.normalize_loudness(
                        target=-20.0, affix_id=True
                    )

But model's accuracy drops a lot. I have not figure it out.

Ref:

https://github.com/lhotse-speech/lhotse/pull/1016
https://github.com/lhotse-speech/lhotse/pull/1029
- The bug is not triggered because the sample_rate of aishell is 16000.

lifeiteng / vall-e

AISHELL1 with cut_set.normalize_loudness #90