Closed loct824 closed 7 months ago
This seems like a possible labeling issue. If you didn't label the AP and SP areas accurately, the model may pronounce something on these two phonemes.
Do you mean that it relates to the quality of the transcriptions.csv
? whether each labelled phoneme correctly correspond to the part in the audio? Any guidance how we could improve other than manually refine the phoneme time positions labelling? thanks.
If you enabled some variance parameters then controlling them can be a workaround. But on the training side I cannot provide more advice without further information.
Hi,
We trained an english model for DiffSinger, but we find that for the synthesized songs, in the middle part of the song where
SP
&AP
occurs, the model gives strange voicing that sounds like the singer is humming a constant strange sound.We give an example below which we use arrows to indicate where that strange humming sound happens.
Could you give us some advice on how the model can be improved/trained to eliminate this strange humming sound during breaks/silence?