open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
7.77k stars 589 forks source link

[BUG]: Issue with the vocoder calling during inference #105

Open ArkhamImp opened 10 months ago

ArkhamImp commented 10 months ago

Describe the bug

In models/base/new_inference.py

vocoder_cfg, vocoder_ckpt = self._parse_vocoder(self.args.vocoder_dir)

If the vocoder checkpoint is saved as '*.pt' it works fine, but there will be error when I using Hifigan checkpoint which is save as '*.bin'.

VocodexElysium commented 10 months ago

Could you tell me which task you tried to run and encountered this BUG?

Currently, regarding loading neural vocoder for other tasks, both a .pt checkpoint or a statedict folder that contains a series of .bin files are supported as in models/vocoders/vocoder_inference.py, the function load_nn_vocoder. The function parse_vocoder should have been discarded.

ArkhamImp commented 10 months ago

So I should not use the inference function in models/base/new_inference.py and write my own?

Or maybe copy the load model function from models/vocoders/vocoder_inference.py is more straightforward.

VocodexElysium commented 10 months ago

So I should not use the inference function in models/base/new_inference.py and write my own?

Or maybe copy the load model function from models/vocoders/vocoder_inference.py is more straightforward.

Yes, you can just write your own inference function with just base names aligned with the new_inference.py

Thanks for your reporting, I will check the compatibility of loading both old .pt vocoder file and new vocoder state dict folder of each downstream tasks this week.

mysxs commented 9 months ago

So I should not use the inference function in models/base/new_inference.py and write my own?

Or maybe copy the load model function from models/vocoders/vocoder_inference.py is more straightforward.

Hi @VocodexElysium~, excuse me! I would like to ask which .bin file do you use with higfigan? Or how did you change your code?

VocodexElysium commented 9 months ago

So I should not use the inference function in models/base/new_inference.py and write my own? Or maybe copy the load model function from models/vocoders/vocoder_inference.py is more straightforward.

Hi @VocodexElysium~, excuse me! I would like to ask which .bin file do you use with higfigan? Or how did you change your code?

Hi @mysxs.

We use the whole folder as a statedict for accelerate loading, so all the .bin files should be used.

I have checked out the issue#142. You encountered a misuse of a deprecated file. For your case, update your related code snippet with the following can solve your problem:

def _parse_vocoder(vocoder_dir):
        r"""Parse vocoder config"""
        vocoder_dir = os.path.abspath(vocoder_dir)
        ckpt_list = [ckpt for ckpt in Path(vocoder_dir).glob("*.pt")]
        if len(ckpt_list) == 0:
            ckpt_path = vocoder_dir
        else:
            ckpt_list.sort(key=lambda x: int(x.stem), reverse=True)
            ckpt_path = str(ckpt_list[0])
        vocoder_cfg = load_config(
            os.path.join(vocoder_dir, "args.json"), lowercase=True
        )
        return vocoder_cfg, ckpt_path

Thanks for your report! We will fix this issue ASAP.