p0p4k / vits2_pytorch

unofficial vits2-TTS implementation in pytorch
https://arxiv.org/abs/2307.16430
MIT License
477 stars 85 forks source link

Training doesn´t start when speaker IDs isn´t sequential from 0 #77

Closed skol101 closed 7 months ago

skol101 commented 7 months ago

I was scratching my head why training was always crashing with multiple errors like this:

../aten/src/ATen/native/cuda/Indexing.cu:1237: indexSelectSmallIndex: block: [1,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1237: indexSelectSmallIndex: block: [1,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed.

Looks like, itś because my custom VCTK-like dataset doesn´t have speaker ID numbering from 0 to N, rather I take VCTK speaker ids (like "374", etc). Why is speaker ID ordering/naming is so strict?

JohnHerry commented 7 months ago

I was scratching my head why training was always crashing with multiple errors like this:

../aten/src/ATen/native/cuda/Indexing.cu:1237: indexSelectSmallIndex: block: [1,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1237: indexSelectSmallIndex: block: [1,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed.

Looks like, itś because my custom VCTK-like dataset doesn´t have speaker ID numbering from 0 to N, rather I take VCTK speaker ids (like "374", etc). Why is speaker ID ordering/naming is so strict?

what is your config value of 'n_speakers' in config.json? I think it will be not so strict as if you have your 'n_speaker' value big enough to contain the largest value of your VCTK speaker ids.