Closed photkey closed 2 years ago
Setting the default SSML template
edge-srt-to-speech srt_file out_file --SSML path/example.xml
The format of example.xml would probably look something like the following.
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
<voice name="en-US-AriaNeural">
<mstts:express-as style="cheerful">
{text}
</mstts:express-as>
</voice>
</speak>
where "{text}" is replaced by the current sentence.
The default SSML and the SSML parameter specified above for a particular sentence can both exist, with the sentence alone taking precedence.
I wrote this idea as soon as it came to me; if you're interested in implementing this feature, I'll smooth out the logic and reorganize the language (to achieve a richer, more personalized feature while minimizing your workload)
I'll work on it when I can. I'm definitely not against but I don't have time to work on it now.
if you're interested in implementing this feature, I'll smooth out the logic and reorganize the language (to achieve a richer, more personalized feature while minimizing your workload)
Do you mean to contribute code? If yes, also try to use a real subtitle library instead of what I came up with :)
I'm very sorry, I'm just starting to teach myself Python and only know a little bit of basic syntax and am not capable of implementing these features. i mean, I probably know how to implement it in a way that will minimize your workload (maybe). I'll reorganize this personalization and will write it here again later. when you have time to implement it. these are exciting features to think about, I just went to a few paid sites specifically and the features are not as powerful as this.
Use srt's library to read srt files, that part maybe I can try to implement, I'll give it a go.
As a heads up, I've speed up the concatenation part dramatically (I think)
Please let me know if it has an impact on accuracy
voice
,default-speed
,default-pitch
,default-volume
,ssml-template
Parameters are divided into global and local parameters (effective for one sentence only), and ssml-template
and voice
,default-speed
,default-pitch
,default-volume
do not have to co-exist.
Same one parameter, local parameter has higher priority; but the logic of auto-acceleration remains the same, and acceleration is still required when it is needed
edge-srt-to-speech srt_file out_file --SSML path/example.xml
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
<voice name="en-US-AriaNeural">
<mstts:express-as style="cheerful">
{text}
</mstts:express-as>
</voice>
</speak>
where "{text}" is replaced by the current sentence.
1
00:00:00,498 --> 00:00:02,827
Here's what I love most about food and diet.
["Normal",{"voice":"en-US-SaraNeural","default-speed":"+1%","default-pitch":"+1Hz","default-volume":"+1%"}]
2
00:00:02,827 --> 00:00:06,383
We all eat several times a day,and we're totally in charge
["ssml-template",{"ssml-template":"path/example.xml"}]
3
00:00:06,383 --> 00:00:09,427
of what goes on our plate and what stays off.
["ssml-value",{"voice":"en-US-SaraNeural","speed":"+1%","style":"cheerful"}]
The list[0] of the local parameters indicates the type of the parameters, Normal
and ssml-template
which should be relatively easy to implement and should probably be the priority; ssml-value
which requires an extra step to automatically convert to SSML templates by these values first, might be more troublesome and might need to be implemented at a later time, adding this type of parameters, the main purpose is to make editing easier and simpler and faster.
As a heads up, I've speed up the concatenation part dramatically (I think)
Please let me know if it has an impact on accuracy
I'll go find some SRT subtitle files and test them out.
When you do make sure to use the current version in master.
It's fast, but it doesn't generate MP3 files, and the error message is as follows (I've replaced the Chinese in the error message with English)
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmps2d68849\36.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmps2d68849\19.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmps2d68849\19.mp3...
Traceback (most recent call last):
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 170, in _main
await asyncio.gather(*coros[i : i + 500])
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 117, in audio_gen
raise Exception(f"Too many retries for {fname}")
Exception: Too many retries for C:\Users\tuike\AppData\Local\Temp\tmps2d68849\47.mp3
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\shutil.py", line 616, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it。: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmps2d68849\\0.mp3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\tempfile.py", line 802, in onerror
_os.unlink(path)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmps2d68849\\0.mp3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 239, in main
asyncio.get_event_loop().run_until_complete(
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 214, in _main
raise Exception("ffmpeg failed")
File "d:\develop\python\python38\lib\tempfile.py", line 827, in __exit__
self.cleanup()
File "d:\develop\python\python38\lib\tempfile.py", line 831, in cleanup
self._rmtree(self.name)
File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "d:\develop\python\python38\lib\shutil.py", line 618, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "d:\develop\python\python38\lib\tempfile.py", line 805, in onerror
cls._rmtree(path)
File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "d:\develop\python\python38\lib\shutil.py", line 599, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "d:\develop\python\python38\lib\shutil.py", line 596, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267]The directory name is invalid.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmps2d68849\\0.mp3'
PS C:\Users\tuike\Downloads>
Can you try now?
yes, now
edge-srt-to-speech-0.0.6
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmpei5v1756\43.mp3...
Traceback (most recent call last):
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 170, in _main
await asyncio.gather(*coros[i : i + batch_size])
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 117, in audio_gen
raise Exception(f"Too many retries for {fname}")
Exception: Too many retries for C:\Users\tuike\AppData\Local\Temp\tmpei5v1756\12.mp3
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\shutil.py", line 616, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpei5v1756\\0.mp3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\tempfile.py", line 802, in onerror
_os.unlink(path)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpei5v1756\\0.mp3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 243, in main
asyncio.get_event_loop().run_until_complete(
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 214, in _main
raise Exception("ffmpeg failed")
File "d:\develop\python\python38\lib\tempfile.py", line 827, in __exit__
self.cleanup()
File "d:\develop\python\python38\lib\tempfile.py", line 831, in cleanup
self._rmtree(self.name)
File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "d:\develop\python\python38\lib\shutil.py", line 618, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "d:\develop\python\python38\lib\tempfile.py", line 805, in onerror
cls._rmtree(path)
File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "d:\develop\python\python38\lib\shutil.py", line 599, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "d:\develop\python\python38\lib\shutil.py", line 596, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] The directory name is invalid.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpei5v1756\\0.mp3'
PS C:\Users\tuike\Downloads>
Could you try again? I've added a small wait time as it errors out
edge-srt-to-speech-0.0.7
ERROR:asyncio:Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "d:\develop\python\python38\lib\asyncio\events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "d:\develop\python\python38\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] The remote host forced an existing connection to close。
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\44.mp3...
...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\0.mp3...
ERROR:asyncio:Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "d:\develop\python\python38\lib\asyncio\events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "d:\develop\python\python38\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] The remote host forced an existing connection to close.
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\7.mp3...
...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\39.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\43.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\37.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\33.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\49.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\41.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\43.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\23.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\20.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\33.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\15.mp3...
Traceback (most recent call last):
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 172, in _main
await asyncio.gather(*coros[i : i + batch_size])
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 118, in audio_gen
raise Exception(f"Too many retries for {fname}")
Exception: Too many retries for C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\49.mp3
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\shutil.py", line 616, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmp30scjuyg\\0.mp3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\tempfile.py", line 802, in onerror
_os.unlink(path)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmp30scjuyg\\0.mp3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 245, in main
asyncio.get_event_loop().run_until_complete(
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 216, in _main
raise Exception("ffmpeg failed")
File "d:\develop\python\python38\lib\tempfile.py", line 827, in __exit__
self.cleanup()
File "d:\develop\python\python38\lib\tempfile.py", line 831, in cleanup
self._rmtree(self.name)
File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "d:\develop\python\python38\lib\shutil.py", line 618, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "d:\develop\python\python38\lib\tempfile.py", line 805, in onerror
cls._rmtree(path)
File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "d:\develop\python\python38\lib\shutil.py", line 599, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "d:\develop\python\python38\lib\shutil.py", line 596, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] The directory name is invalid.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmp30scjuyg\\0.mp3'
PS C:\Users\tuike\Downloads>
With --parallel-batch-size
set to 1, does it work? Maybe it's some kind of Windows issue...
I'll try it now.
edge-srt-to-speech .\example.srt example.mp3 --parallel-batch-size 1
DEBUG:asyncio:Using proactor: IocpProactor
DEBUG:root:Preparing 0...
DEBUG:root:Preparing 1...
DEBUG:root:Preparing 2...
DEBUG:root:Preparing 3...
DEBUG:root:Preparing 4...
DEBUG:root:Preparing 5...
DEBUG:root:Preparing 6...
DEBUG:root:Preparing 7...
DEBUG:root:Preparing 8...
DEBUG:root:Preparing 9...
DEBUG:root:Preparing 10...
DEBUG:root:Preparing 11...
DEBUG:root:Preparing 12...
DEBUG:root:Preparing 13...
DEBUG:root:Preparing 14...
DEBUG:root:Preparing 15...
DEBUG:root:Preparing 16...
DEBUG:root:Preparing 17...
DEBUG:root:Preparing 18...
DEBUG:root:Preparing 19...
DEBUG:root:Preparing 20...
DEBUG:root:Preparing 21...
DEBUG:root:Preparing 22...
DEBUG:root:Preparing 23...
DEBUG:root:Preparing 24...
DEBUG:root:Preparing 25...
DEBUG:root:Preparing 26...
DEBUG:root:Preparing 27...
DEBUG:root:Preparing 28...
DEBUG:root:Preparing 29...
DEBUG:root:Preparing 30...
DEBUG:root:Preparing 31...
DEBUG:root:Preparing 32...
DEBUG:root:Preparing 33...
DEBUG:root:Preparing 34...
DEBUG:root:Preparing 35...
DEBUG:root:Preparing 36...
DEBUG:root:Preparing 37...
DEBUG:root:Preparing 38...
DEBUG:root:Preparing 39...
DEBUG:root:Preparing 40...
DEBUG:root:Preparing 41...
DEBUG:root:Preparing 42...
DEBUG:root:Preparing 43...
DEBUG:root:Preparing 44...
DEBUG:root:Preparing 45...
DEBUG:root:Preparing 46...
DEBUG:root:Preparing 47...
DEBUG:root:Preparing 48...
DEBUG:root:Preparing 49...
DEBUG:root:Preparing 50...
DEBUG:root:Preparing 51...
DEBUG:root:Preparing 52...
DEBUG:root:Preparing 53...
DEBUG:root:Preparing 54...
DEBUG:root:Preparing 55...
DEBUG:root:Preparing 56...
DEBUG:root:Preparing 57...
DEBUG:root:Preparing 58...
DEBUG:root:Preparing 59...
DEBUG:root:Preparing 60...
DEBUG:root:Preparing 61...
DEBUG:root:Preparing 62...
DEBUG:root:Preparing 63...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
Traceback (most recent call last):
File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 245, in main
asyncio.get_event_loop().run_until_complete(
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 172, in _main
await asyncio.gather(*coros[i : i + batch_size])
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 118, in audio_gen
raise Exception(f"Too many retries for {fname}")
Exception: Too many retries for C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3
sys:1: RuntimeWarning: coroutine 'audio_gen' was never awaited
Could you share the SRT?
Sorry, I forgot, my srt is in Chinese and I didn't specify the Chinese pronunciation in the parameters. i'll specify the Chinese pronunciation again now and try again. example.zip
I think the reason is that you didn't have it use a Chinese voice
Specifying the Chinese voice, --parallel-batch-size 1
still fails, but I hardly ever see a retry halfway through.
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_59.mp3
DEBUG:root:Needed -0.08100000000007412 seconds for 60
DEBUG:root:Needed 0.4659999999999229 seconds for 61
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_61.mp3...
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_61.mp3
DEBUG:root:Needed 3.431999999999931 seconds for 62
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_62.mp3...
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_62.mp3
DEBUG:root:Needed 3.2859999999999445 seconds for 63
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_63.mp3...
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_63.mp3
Traceback (most recent call last):
File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 245, in main
asyncio.get_event_loop().run_until_complete(
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 216, in _main
raise Exception("ffmpeg failed")
Exception: ffmpeg failed
The default setting is --parallel-batch-size 100
, which is almost entirely retries.
Could you comment out stdout
and stderr
lines? And share output again
Is main.py commented out inside this file? There are 3 places in total. Do you comment them all out?
Yes
Input #0, lavfi, from 'anullsrc=cl=mono:r=24000':
Duration: N/A, start: 0.000000, bitrate: 192 kb/s
Stream #0:0: Audio: pcm_u8, 24000 Hz, mono, u8, 192 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_u8 (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'C:\Users\tuike\AppData\Local\Temp\tmpu024g4i6\silence_62.mp3':
Metadata:
TSSE : Lavf59.16.100
Stream #0:0: Audio: mp3, 24000 Hz, mono, s16p
Metadata:
encoder : Lavc59.18.100 libmp3lame
size= 14kB time=00:00:03.43 bitrate= 33.0kbits/s speed= 440x
video:0kB audio:14kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.702586%
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpu024g4i6\silence_62.mp3
DEBUG:root:Needed 3.2859999999999445 seconds for 63
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmpu024g4i6\silence_63.mp3...
ffmpeg version 5.0-full_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 11.2.0 (Rev5, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil 57. 17.100 / 57. 17.100
libavcodec 59. 18.100 / 59. 18.100
libavformat 59. 16.100 / 59. 16.100
libavdevice 59. 4.100 / 59. 4.100
libavfilter 8. 24.100 / 8. 24.100
libswscale 6. 4.100 / 6. 4.100
libswresample 4. 3.100 / 4. 3.100
libpostproc 56. 3.100 / 56. 3.100
Input #0, lavfi, from 'anullsrc=cl=mono:r=24000':
Duration: N/A, start: 0.000000, bitrate: 192 kb/s
Stream #0:0: Audio: pcm_u8, 24000 Hz, mono, u8, 192 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_u8 (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'C:\Users\tuike\AppData\Local\Temp\tmpu024g4i6\silence_63.mp3':
Metadata:
TSSE : Lavf59.16.100
Stream #0:0: Audio: mp3, 24000 Hz, mono, s16p
Metadata:
encoder : Lavc59.18.100 libmp3lame
size= 13kB time=00:00:03.28 bitrate= 33.0kbits/s speed= 442x
video:0kB audio:13kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.776079%
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpu024g4i6\silence_63.mp3
ffmpeg version 5.0-full_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 11.2.0 (Rev5, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil 57. 17.100 / 57. 17.100
libavcodec 59. 18.100 / 59. 18.100
libavformat 59. 16.100 / 59. 16.100
libavdevice 59. 4.100 / 59. 4.100
libavfilter 8. 24.100 / 8. 24.100
libswscale 6. 4.100 / 6. 4.100
libswresample 4. 3.100 / 4. 3.100
libpostproc 56. 3.100 / 56. 3.100
C:\Users\tuike\AppData\Local\Temp\tmpoyl9iq1k: Permission denied
Traceback (most recent call last):
File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 245, in main
asyncio.get_event_loop().run_until_complete(
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 216, in _main
raise Exception("ffmpeg failed")
Exception: ffmpeg failed
Could you attempt it again with the new version? Please comment stdout and stderr again if it doesn't work and returns "ffmpeg failed".
Wait a minute, tried several times via pip without success, the cache hasn't been updated yet I think
Now that everything is working, I'll try again to see how much parallel-batch-size
can be set at most before it's error-prone.
parallel-batch-size 150
works fine too, it's flying, nice work! I'll go test some more different SRT files.
Very good :)
I downloaded some SRT subtitle files for the movie, probably because the sentences were too many and the error message was reported as follows.
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpa43ftvzl\396.mp3
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpa43ftvzl\387.mp3
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpa43ftvzl\399.mp3
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpa43ftvzl\395.mp3
Traceback (most recent call last):
File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 986, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs) # type: ignore[return-value] # noqa
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 1025, in create_connection
raise exceptions[0]
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 1010, in create_connection
sock = await self._connect_sock(
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 924, in _connect_sock
await self.sock_connect(sock, address)
File "d:\develop\python\python38\lib\asyncio\proactor_events.py", line 702, in sock_connect
return await self._proactor.connect(sock, address)
File "d:\develop\python\python38\lib\asyncio\windows_events.py", line 812, in _poll
value = callback(transferred, key, ov)
File "d:\develop\python\python38\lib\asyncio\windows_events.py", line 599, in finish_connect
ov.getresult()
OSError: [WinError 121] The signal timeout has expired
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 250, in main
asyncio.get_event_loop().run_until_complete(
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 172, in _main
await asyncio.gather(*coros[i : i + batch_size])
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 105, in audio_gen
async for j in communicate.run(
File "d:\develop\python\python38\lib\site-packages\edge_tts\communicate.py", line 271, in run
async with session.ws_connect(
File "d:\develop\python\python38\lib\site-packages\aiohttp\client.py", line 1138, in __aenter__
self._resp = await self._coro
File "d:\develop\python\python38\lib\site-packages\aiohttp\client.py", line 776, in _ws_connect
resp = await self.request(
File "d:\develop\python\python38\lib\site-packages\aiohttp\client.py", line 535, in _request
conn = await self._connector.connect(
File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 542, in connect
proto = await self._create_connection(req, traces, timeout)
File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 907, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 1206, in _create_direct_connection
raise last_exc
File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 1175, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 992, in _wrap_create_connection
raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host speech.platform.bing.com:443 ssl:default [The signal timeout has expired]
sys:1: RuntimeWarning: coroutine 'audio_gen' was never awaited
Reset parallel-batch-size 50
to successfully generate MP3
Just a live feedback on the errors encountered and I will continue testing.
Just had an idea during testing, discuss with you if it's necessary to implement it?
I was re-generating an MP3 file for a long narrated video and found that Microsoft's text-to-speech was so good that it could be compared to a real person (I noticed that Microsoft's file-to-speech algorithm has recently been upgraded), but the resulting MP3 was not quite as good as the speed, although the timeline was correct, and if set at a reasonable speed, Microsoft's dubbing was really better than some of the less pronounced real people. So my idea is to set a sentence and the time it takes for the human to read it aloud (the video author can read the sentence himself and record the time), and use that time to calculate the most reasonable speed of speech, because different voices of Microsoft speak at different speeds, let the selected voice read the sentence at the default speed, record the time, and then calculate the ratio with the one read by the human, so that no matter which voice is selected, the most reasonable speed of speech can be calculated automatically, closest to the speed of speech the author wants to control.
Maybe this feature should be attributed to edge-tts, either way, I think it is something that could be closer to the author's desire to control the speed, otherwise it would have to be tried over and over and over again.
So you'd like to calculate the default rate that you want?
Yes, the default rate of the selected VOICE is calculated automatically from the text parameters and time parameters passed in. what do you think?
I don't think it's possible
Let's say the time parameter is passed in as 10 seconds, and the selected voice actually takes 20 seconds to read the sentence aloud using the default rate; this automatically sets the rate to +100%, isn't that the speed of speech we need?
Yes but I have no way of knowing how long the selected voice would take. Also because it is some neural network sometimes it might say some sentences faster than others. No simple formula for this :/
I didn't explain how to use it:
1
00:00:00,596 --> 00:00:04,744
首先 我们可以用这种酒精湿巾
edge_tts{voice:zh-CN-XiaoxiaoNeural}
2
00:00:04,744 --> 00:00:08,340
Hello, my name is Bob and I love cooking
edge_tts{voice:en-US-BrandonNeural}
3
00:00:10,440 --> 00:00:20,440
I love it too!
edge_tts{voice:en-US-AshleyNeural,rate:x-fast,pitch:high}
So far only pitch, rate, volume and voice are supported. If you'd like more make another issue :)
Ok, I'll test this version first, and I have to say you are really efficient, I thought all these features would be possible a long time later.
fixed
Set the SSML parameters separately for a sentence in the SRT file in the following two formats For reference only.