sidharthrajaram / StyleTTS2

šŸ šŸ¤– Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning
Other
135 stars 34 forks source link

Some text inputs throw expanded size of the tensor #14

Open Cabeda opened 8 months ago

Cabeda commented 8 months ago

First of all thanks a lot for providing this package!

I was doing some tests for large inputs and found that when trying to send a large string (i.e. len >1000) it would throw the following error:

The expanded size of the tensor (546) must match the existing size (512) at non-singleton dimension 1.  Target sizes: [1, 546].  Tensor sizes: [1, 512]

In terms of code, I used all the defaults:

out = my_tts.inference(
    text, 
    output_wav_file="test_nb.wav"
)
sidharthrajaram commented 8 months ago

Thanks for the note @Cabeda , are you observing this issue with the latest release of the package?

And if it's okay, could you share the text that triggered the tensor size error? (just so it's easier to test on the same footing)

Cabeda commented 8 months ago

Hi, I'm using the version 0.1.6. For the test, I'm using the text from the book Martian: Lost Sols (it's a free one).

Lost_sols.txt

Cabeda commented 8 months ago

Another test that throwed the same error. I've split the text to blocks of 500 chars. The first one worked fine, the second one threw the error of this issue.

However, the NSA assisted us with software they declined to explain and we now have these additional log entries. ATTACHMENT: LOG ENTRY: SOL 488 Well. Fuck me raw. I navigated my way around the dust storm, so I thought the ā€œpain in the assā€ portion of my journey was over. But no, no. Apparently, Mars isnā€™t done handing me bullshit. There I was, driving along in Meridiani Planum. Smooth sailing from here on out ā€“ or so I thought. The terrain was rough but nothing the rover couldnā€™t handle.

However, if I further divide this text in half and run separately, it succeeds