olaviinha / NeuralTextToAudio

Text prompt steered synthetic audio generators
45 stars 6 forks source link

Issue running the Tortoise TTS collab #2

Open ghost opened 10 months ago

ghost commented 10 months ago

I'm getting the following error while trying to run the generation:

IndexError Traceback (most recent call last) in <cell line: 64>() 88 bytes_collected = 0 89 for voice_file in voice_files: ---> 90 voice_file = remove_silence(voice_file, window_size=2, threshold=0.1, save_as=dir_tmp_processed+path_leaf(voice_file)) 91 file_duration = get_audio_duration(voice_file) 92 slice_file = dir_tmp_slices+path_leaf(voice_file)

2 frames in clip_audio(audio_data, start, duration, sr) 94 xstart = librosa.time_to_samples(start, sr=sr) 95 xduration = librosa.time_to_samples(start+duration, sr=sr) ---> 96 audio_data = audio_data[:, xstart:xduration] 97 return audio_data 98

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

olaviinha commented 10 months ago

Hmm is your voice_audio mono and not stereo?

ghost commented 10 months ago

Ok, so I've resampled all my audio to 48khz, now I'm getting a different error. Is there a maximum number of files you can use at a time? image

olaviinha commented 10 months ago

Notebook has been updated. Looks like I've failed to update it after some previous fixes. Please refresh and let me know if you are still experiencing issues.

Sample rate shouldn't matter, as the notebook will in any case re-encode it to 22050 hz. Tortoise TTS outputs 24 kHz audio.

Also: If I'm reading that screenshot correctly, your audio is about 20 seconds. As instructed in the notebook, about 1 minute of audio is required. Make sure you have 1 min audio.


If you want higher sample rate, feel free to try Sloppy Upsampler notebook. I have no idea if it makes speech better tho.