youmebangbang / TTS-dataset-tools

Automatically generates TTS dataset using audio and associated text. Make cuts under a custom length. Uses Google Speech to text API to perform diarization and transcription or aeneas to force align text to audio.
MIT License
50 stars 16 forks source link

Problem running diarization and returning csv transcriptions #1

Open RicardoGrayson opened 2 years ago

RicardoGrayson commented 2 years ago

Hi I'm trying to follow the video on youtube and I keep running into this issue when I start running my wav files (which i converted to mono). I'm running python 3.7 and dearpygui v0.6.415 on a windows OS and using google cloud services:

Uploading C:\Users\Robin\TTS-dataset-tools\sultansupreme-source\22050/sultan_18.wav to google cloud storage bucket C:\Users\Robin\PycharmProjects\pythonProject\venv\lib\site-packages\pydub\utils.py:198: RuntimeWarning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work warn("Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work", RuntimeWarning) Traceback (most recent call last): File "C:/Users/Robin/TTS-dataset-tools/tools.py", line 70, in run_google_speech_call builder.diarization(get_value("label_wav_file_transcribe"), get_value("input_storage_bucket"), get_value("input_project_name")) File "C:\Users\Robin\TTS-dataset-tools\dataset_builder.py", line 397, in diarization info = mediainfo(wavfile) File "C:\Users\Robin\PycharmProjects\pythonProject\venv\lib\site-packages\pydub\utils.py", line 334, in mediainfo res = Popen(command, stdout=PIPE) File "C:\Users\Robin\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 756, in __init__ restore_signals, start_new_session) File "C:\Users\Robin\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 1155, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified

Any help would be appreciated. Thanks!

RicardoGrayson commented 2 years ago

So I got the diarization to work, but as soon as it needs to start the transcription process and after splitting all the audio files, it crashes saying: Traceback (most recent call last): File "tools.py", line 79, in run_dataset_builder_call builder.build_dataset() File "C:\Users\Robin\TTS-dataset-tools\dataset_builder.py", line 203, in build_dataset text = text.replace("%", " percent") UnboundLocalError: local variable 'text' referenced before assignment

I don't know how to assign the 'text' local variable in dataset_builder.py without conflicting with google cloud speech-to-text. All help appreciated!