lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
https://lifeiteng.github.io/valle/index.html
Apache License 2.0
2.04k stars 320 forks source link

AISHELL1 with cut_set.normalize_loudness #90

Open lifeiteng opened 1 year ago

lifeiteng commented 1 year ago

I used cut_set.normalize_loudness because the loudness of aishell audio files is small, https://github.com/lifeiteng/vall-e/blob/main/valle/bin/tokenizer.py#L173

                if args.prefix == "aishell":
                    # NOTE: the loudness of aishell audio files is around -33
                    # The best way is datamodule --on-the-fly-feats --enable-audio-aug
                    cut_set = cut_set.normalize_loudness(
                        target=-20.0, affix_id=True
                    )

But model's accuracy drops a lot. I have not figure it out. image

Ref: