Open Nanshanelectrician opened 6 months ago
Can you be more specific? Do you mean unseen speakers? Unseen samples? What kind of input that is not in training data?
I use two of my own audio files, hoping that A will imitate what B says. The output result obtained does not seem to be what A said, so it feels unreal
How to deal with using non training data for inference, and the inference results are not realistic enough to restore