openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.73k stars 288 forks source link

Inference DiffSinger #204

Closed Arseny5 closed 3 months ago

Arseny5 commented 3 months ago

Hello! Thank you so much for your work!

I have a question about inference DiffSinger. I trained my DiffSinger model and set the selected_param = "tension" parameter during training. When I try to run inference of model from the checkpoint, the tension parameter is required from me in the .ds file, how can I get around this error?

Example .ds file that I use for inference model:

[
    {
        "name": 27.937,
        "ph_seq": "AP AP w aa z t uw SP ah k r ey z iy SP AP sh ow d y uw m ay SP",
        "ph_dur": "0.31347 0.50484 0.08383 0.11616 0.12914 0.53257 0.56624 1.97912 0.2038 0.22003 0.04993 0.37145 0.01459 1.17554 1.18129 0.49866 0.25851 0.23559 0.01729 0.00272 0.29607 0.05527 1.35808 0.73758",
        "ph_num": "1 6 7 1 8 1",
        "note_seq": "rest rest A#3+40 C4-24 B3-21 G3 E3-22 E3-22 rest G#3+3 G#3-4 B3-38 A#3+38 F#3+43 rest F3-36 rest rest rest rest D#3+25 A3-17 B3-7 C4-8 C4-8 C4-3 C4-3 B3-18 rest",
        "note_dur": "0.31347 0.359909 0.2322 0.24381 0.522449 0.336689 0.237723 0.087356 1.845986 0.22059 0.24381 0.2322 0.301859 0.371519 0.429569 0.281571 1.030356 0.122 0.028934 0.504939 0.267029 0.301859 0.301859 0.267029 0.16254 0.916935 0.244063 0.336689 0.156828",
        "f0_seq": "263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7 263.7",
        "f0_timestep": "0.005"
    }
]
yqzhishen commented 3 months ago

There is no selected_param in this repository. Seems like you are not using our repository directly. In this case, you should turn to the owner of the repo you are using, and we are not able to provide help, sorry.

Arseny5 commented 3 months ago

There is no selected_param in this repository. Seems like you are not using our repository directly. In this case, you should turn to the owner of the repo you are using, and we are not able to provide help, sorry.

Yes, I'm sorry, the parameter is not from your work. My question is about your repository. If I use tension emb in training, but do not apply for tension in the inference, can this spoil the quality of the prediction?

yqzhishen commented 3 months ago

Your acoustic model using tension_embed means that it must consume a tension input, otherwise it cannot even produce reasonable outputs. You should consider training a variance model that predicts tension for it.

Arseny5 commented 3 months ago

Your acoustic model using tension_embed means that it must consume a tension input, otherwise it cannot even produce reasonable outputs. You should consider training a variance model that predicts tension for it.

Сan I train a DiffSinger model without any variance embeddings? Will the DiffSinger train and predict singing?

yqzhishen commented 3 months ago

You can turn off all variance embeddings. But why not train a variance model to gain more controls? Variance models are relatively small and cost less computational resources.

Arseny5 commented 3 months ago

I have another questions. I have trained DiffSinger using code from your repository. I trained only an acoustic model without embedding from variance model.

1) In tensorboard, I see 3 results in the audio column: aux, diff, gt. As far as I know gt is a ground truth, diff are predicates of the diffusion model. What is aux?

2) As training data, I used _ph_seq, ph_dur, ph_num, note_seq, notedur. When training, I choose pitch extractor - rmvpe. When I try to make an inference checkpoint of the model, it requires from me the column _f0seq, which I don't have. How is it possible to use the inference model without knowledge of f0? Can I not use f0 in training like in this pipeline?