open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.45k stars 379 forks source link

[Help]: The CustomSVCDataset inference/conversion problem occurred #128

Closed mysxs closed 7 months ago

mysxs commented 7 months ago

Problem Overview

An error occurred in infer_target_speaker while inferring with own data set

Steps Taken

When I did the inference/conversion, I was told that the problem corresponding to target_speaker could not be found image The error appears in infer_target_speaker. Execute the infer_target_speaker command as follows:--infer_target_speaker 16 \ Because my singers.json is like this, I chose "[ESD]_0017": 16 as my target_speaker, but I don't have a folder for"[ESD]_0017": 16 under the data path 微信图片_20240131210212 image Only '[ESD]'(all my CustomSVCDataset) and 0004_000563 (my source_audio) in the data path image

In data/[ESD]/mel_min_max_stats, there are mel_max.npy andmel_min.npy, and in data/[ESD]/mels, there are all the.npy files for my CustomSVCDataset image So I can't find the infer_target_speaker.npy file

Lokshaw-Chau commented 7 months ago

Hi @mysxs !

I think simply run command with --infer_target_speaker '[ESD]_0017' \ may sovle your problem.

Actually we will look up [ESD]_0017 in singer.json to get 16 which is the index of the target singer's embedding in your trained model.

mysxs commented 7 months ago

Hi @Lokshaw-Chau ! Ok, it worked, then a new problem appeared, I continued to solve it, thank you for your reply!