Regarding Non-speech Vocal data in a dataset

yl4579 / StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

MIT License

466 stars 110 forks source link

I was wondering if I could include laughing, sobbing and crying sounds of each person in the dataset, is it possible to clone these as well? Since I assume there's no phonemes in these sounds, I'm worried about it affecting the overall quality.

If it's possible, How much do you think would be fair to Include?

And sorry, while I'm here let me ask another question I've had; Should the training samples be of the same length? (which seem to be 5 seconds long.) what happens if I have samples with varying lengths?

yl4579 / StarGANv2-VC

Regarding Non-speech Vocal data in a dataset #92