[Feature Request] setting parameters for a sentence in an SRT file

photkey commented 2 years ago

Set the SSML parameters separately for a sentence in the SRT file in the following two formats For reference only.

1
00:00:00,498 --> 00:00:02,827
Here's what I love most about food and diet.

2
00:00:02,827 --> 00:00:06,383
We all eat several times a day,and we're totally in charge
voice:en-US-SaraNeural,speed:+10%,volume:-5%,style:cheerful,styledegree:2,role:OlderAdultMale……

3
00:00:06,383 --> 00:00:09,427
of what goes on our plate and what stays off.

1
00:00:00,498 --> 00:00:02,827
Here's what I love most about food and diet.

2
00:00:02,827 --> 00:00:06,383
{voice:en-US-SaraNeural,speed:+10%,volume:-5%,style:cheerful,styledegree:2,role:OlderAdultMale……}We all eat several times a day,and we're totally in charge

3
00:00:06,383 --> 00:00:09,427
of what goes on our plate and what stays off.

photkey commented 2 years ago

Setting the default SSML template edge-srt-to-speech srt_file out_file --SSML path/example.xml The format of example.xml would probably look something like the following.

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
    <voice name="en-US-AriaNeural">
        <mstts:express-as style="cheerful">
            {text}
        </mstts:express-as>
    </voice>
</speak>

where "{text}" is replaced by the current sentence.

The default SSML and the SSML parameter specified above for a particular sentence can both exist, with the sentence alone taking precedence.

photkey commented 2 years ago

I wrote this idea as soon as it came to me; if you're interested in implementing this feature, I'll smooth out the logic and reorganize the language (to achieve a richer, more personalized feature while minimizing your workload)

rany2 commented 2 years ago

I'll work on it when I can. I'm definitely not against but I don't have time to work on it now.

rany2 commented 2 years ago

if you're interested in implementing this feature, I'll smooth out the logic and reorganize the language (to achieve a richer, more personalized feature while minimizing your workload)

Do you mean to contribute code? If yes, also try to use a real subtitle library instead of what I came up with :)

photkey commented 2 years ago

I'm very sorry, I'm just starting to teach myself Python and only know a little bit of basic syntax and am not capable of implementing these features. i mean, I probably know how to implement it in a way that will minimize your workload (maybe). I'll reorganize this personalization and will write it here again later. when you have time to implement it. these are exciting features to think about, I just went to a few paid sites specifically and the features are not as powerful as this.

Use srt's library to read srt files, that part maybe I can try to implement, I'll give it a go.

rany2 commented 2 years ago

As a heads up, I've speed up the concatenation part dramatically (I think)

Please let me know if it has an impact on accuracy

photkey commented 2 years ago

Parameters

voice,default-speed,default-pitch,default-volume,ssml-template Parameters are divided into global and local parameters (effective for one sentence only), and ssml-template and voice,default-speed,default-pitch,default-volumedo not have to co-exist.

Same one parameter, local parameter has higher priority; but the logic of auto-acceleration remains the same, and acceleration is still required when it is needed

Global parameters SSML template

edge-srt-to-speech srt_file out_file --SSML path/example.xml

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
    <voice name="en-US-AriaNeural">
        <mstts:express-as style="cheerful">
            {text}
        </mstts:express-as>
    </voice>
</speak>

where "{text}" is replaced by the current sentence.

Local Parameters

1
00:00:00,498 --> 00:00:02,827
Here's what I love most about food and diet.
["Normal",{"voice":"en-US-SaraNeural","default-speed":"+1%","default-pitch":"+1Hz","default-volume":"+1%"}]

2
00:00:02,827 --> 00:00:06,383
We all eat several times a day,and we're totally in charge
["ssml-template",{"ssml-template":"path/example.xml"}]

3
00:00:06,383 --> 00:00:09,427
of what goes on our plate and what stays off.
["ssml-value",{"voice":"en-US-SaraNeural","speed":"+1%","style":"cheerful"}]

The list[0] of the local parameters indicates the type of the parameters, Normal and ssml-template which should be relatively easy to implement and should probably be the priority; ssml-value which requires an extra step to automatically convert to SSML templates by these values first, might be more troublesome and might need to be implemented at a later time, adding this type of parameters, the main purpose is to make editing easier and simpler and faster.

photkey commented 2 years ago

As a heads up, I've speed up the concatenation part dramatically (I think)

Please let me know if it has an impact on accuracy

I'll go find some SRT subtitle files and test them out.

rany2 commented 2 years ago

When you do make sure to use the current version in master.

photkey commented 2 years ago

It's fast, but it doesn't generate MP3 files, and the error message is as follows (I've replaced the Chinese in the error message with English)

DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmps2d68849\36.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmps2d68849\19.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmps2d68849\19.mp3...
Traceback (most recent call last):
 File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 170, in _main
 await asyncio.gather(*coros[i : i + 500])
 File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 117, in audio_gen
 raise Exception(f"Too many retries for {fname}")
Exception: Too many retries for C:\Users\tuike\AppData\Local\Temp\tmps2d68849\47.mp3

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "d:\develop\python\python38\lib\shutil.py", line 616, in _rmtree_unsafe
 os.unlink(fullname)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it。: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmps2d68849\\0.mp3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "d:\develop\python\python38\lib\tempfile.py", line 802, in onerror
 _os.unlink(path)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmps2d68849\\0.mp3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
 return _run_code(code, main_globals, None,
 File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
 exec(code, run_globals)
 File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
 File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 239, in main
 asyncio.get_event_loop().run_until_complete(
 File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
 return future.result()
 File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 214, in _main
 raise Exception("ffmpeg failed")
 File "d:\develop\python\python38\lib\tempfile.py", line 827, in __exit__
 self.cleanup()
 File "d:\develop\python\python38\lib\tempfile.py", line 831, in cleanup
 self._rmtree(self.name)
 File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
 _shutil.rmtree(name, onerror=onerror)
 File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
 return _rmtree_unsafe(path, onerror)
 File "d:\develop\python\python38\lib\shutil.py", line 618, in _rmtree_unsafe
 onerror(os.unlink, fullname, sys.exc_info())
 File "d:\develop\python\python38\lib\tempfile.py", line 805, in onerror
 cls._rmtree(path)
 File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
 _shutil.rmtree(name, onerror=onerror)
 File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
 return _rmtree_unsafe(path, onerror)
 File "d:\develop\python\python38\lib\shutil.py", line 599, in _rmtree_unsafe
 onerror(os.scandir, path, sys.exc_info())
 File "d:\develop\python\python38\lib\shutil.py", line 596, in _rmtree_unsafe
 with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267]The directory name is invalid.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmps2d68849\\0.mp3'
PS C:\Users\tuike\Downloads>

rany2 commented 2 years ago

Can you try now?

photkey commented 2 years ago

yes, now

photkey commented 2 years ago

edge-srt-to-speech-0.0.6

DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmpei5v1756\43.mp3...
Traceback (most recent call last):
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 170, in _main
    await asyncio.gather(*coros[i : i + batch_size])
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 117, in audio_gen
    raise Exception(f"Too many retries for {fname}")
Exception: Too many retries for C:\Users\tuike\AppData\Local\Temp\tmpei5v1756\12.mp3

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\develop\python\python38\lib\shutil.py", line 616, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpei5v1756\\0.mp3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\develop\python\python38\lib\tempfile.py", line 802, in onerror
    _os.unlink(path)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpei5v1756\\0.mp3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 243, in main
    asyncio.get_event_loop().run_until_complete(
  File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 214, in _main
    raise Exception("ffmpeg failed")
  File "d:\develop\python\python38\lib\tempfile.py", line 827, in __exit__
    self.cleanup()
  File "d:\develop\python\python38\lib\tempfile.py", line 831, in cleanup
    self._rmtree(self.name)
  File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 618, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "d:\develop\python\python38\lib\tempfile.py", line 805, in onerror
    cls._rmtree(path)
  File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 599, in _rmtree_unsafe
    onerror(os.scandir, path, sys.exc_info())
  File "d:\develop\python\python38\lib\shutil.py", line 596, in _rmtree_unsafe
    with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] The directory name is invalid.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpei5v1756\\0.mp3'
PS C:\Users\tuike\Downloads>

rany2 commented 2 years ago

Could you try again? I've added a small wait time as it errors out

photkey commented 2 years ago

edge-srt-to-speech-0.0.7

ERROR:asyncio:Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "d:\develop\python\python38\lib\asyncio\events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "d:\develop\python\python38\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] The remote host forced an existing connection to close。
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\44.mp3...
...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\0.mp3...
ERROR:asyncio:Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "d:\develop\python\python38\lib\asyncio\events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "d:\develop\python\python38\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] The remote host forced an existing connection to close.
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\7.mp3...
...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\39.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\43.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\37.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\33.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\49.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\41.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\43.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\23.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\20.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\33.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\15.mp3...
Traceback (most recent call last):
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 172, in _main
    await asyncio.gather(*coros[i : i + batch_size])
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 118, in audio_gen
    raise Exception(f"Too many retries for {fname}")
Exception: Too many retries for C:\Users\tuike\AppData\Local\Temp\tmp30scjuyg\49.mp3

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\develop\python\python38\lib\shutil.py", line 616, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmp30scjuyg\\0.mp3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\develop\python\python38\lib\tempfile.py", line 802, in onerror
    _os.unlink(path)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmp30scjuyg\\0.mp3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 245, in main
    asyncio.get_event_loop().run_until_complete(
  File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 216, in _main
    raise Exception("ffmpeg failed")
  File "d:\develop\python\python38\lib\tempfile.py", line 827, in __exit__
    self.cleanup()
  File "d:\develop\python\python38\lib\tempfile.py", line 831, in cleanup
    self._rmtree(self.name)
  File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 618, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "d:\develop\python\python38\lib\tempfile.py", line 805, in onerror
    cls._rmtree(path)
  File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 599, in _rmtree_unsafe
    onerror(os.scandir, path, sys.exc_info())
  File "d:\develop\python\python38\lib\shutil.py", line 596, in _rmtree_unsafe
    with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] The directory name is invalid.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmp30scjuyg\\0.mp3'
PS C:\Users\tuike\Downloads>

rany2 commented 2 years ago

With --parallel-batch-size set to 1, does it work? Maybe it's some kind of Windows issue...

photkey commented 2 years ago

I'll try it now.

photkey commented 2 years ago

edge-srt-to-speech .\example.srt example.mp3 --parallel-batch-size 1

DEBUG:asyncio:Using proactor: IocpProactor
DEBUG:root:Preparing 0...
DEBUG:root:Preparing 1...
DEBUG:root:Preparing 2...
DEBUG:root:Preparing 3...
DEBUG:root:Preparing 4...
DEBUG:root:Preparing 5...
DEBUG:root:Preparing 6...
DEBUG:root:Preparing 7...
DEBUG:root:Preparing 8...
DEBUG:root:Preparing 9...
DEBUG:root:Preparing 10...
DEBUG:root:Preparing 11...
DEBUG:root:Preparing 12...
DEBUG:root:Preparing 13...
DEBUG:root:Preparing 14...
DEBUG:root:Preparing 15...
DEBUG:root:Preparing 16...
DEBUG:root:Preparing 17...
DEBUG:root:Preparing 18...
DEBUG:root:Preparing 19...
DEBUG:root:Preparing 20...
DEBUG:root:Preparing 21...
DEBUG:root:Preparing 22...
DEBUG:root:Preparing 23...
DEBUG:root:Preparing 24...
DEBUG:root:Preparing 25...
DEBUG:root:Preparing 26...
DEBUG:root:Preparing 27...
DEBUG:root:Preparing 28...
DEBUG:root:Preparing 29...
DEBUG:root:Preparing 30...
DEBUG:root:Preparing 31...
DEBUG:root:Preparing 32...
DEBUG:root:Preparing 33...
DEBUG:root:Preparing 34...
DEBUG:root:Preparing 35...
DEBUG:root:Preparing 36...
DEBUG:root:Preparing 37...
DEBUG:root:Preparing 38...
DEBUG:root:Preparing 39...
DEBUG:root:Preparing 40...
DEBUG:root:Preparing 41...
DEBUG:root:Preparing 42...
DEBUG:root:Preparing 43...
DEBUG:root:Preparing 44...
DEBUG:root:Preparing 45...
DEBUG:root:Preparing 46...
DEBUG:root:Preparing 47...
DEBUG:root:Preparing 48...
DEBUG:root:Preparing 49...
DEBUG:root:Preparing 50...
DEBUG:root:Preparing 51...
DEBUG:root:Preparing 52...
DEBUG:root:Preparing 53...
DEBUG:root:Preparing 54...
DEBUG:root:Preparing 55...
DEBUG:root:Preparing 56...
DEBUG:root:Preparing 57...
DEBUG:root:Preparing 58...
DEBUG:root:Preparing 59...
DEBUG:root:Preparing 60...
DEBUG:root:Preparing 61...
DEBUG:root:Preparing 62...
DEBUG:root:Preparing 63...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Retrying C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3...
Traceback (most recent call last):
  File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 245, in main
    asyncio.get_event_loop().run_until_complete(
  File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 172, in _main
    await asyncio.gather(*coros[i : i + batch_size])
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 118, in audio_gen
    raise Exception(f"Too many retries for {fname}")
Exception: Too many retries for C:\Users\tuike\AppData\Local\Temp\tmp5l65wxi8\0.mp3
sys:1: RuntimeWarning: coroutine 'audio_gen' was never awaited

rany2 commented 2 years ago

Could you share the SRT?

photkey commented 2 years ago

Sorry, I forgot, my srt is in Chinese and I didn't specify the Chinese pronunciation in the parameters. i'll specify the Chinese pronunciation again now and try again. example.zip

rany2 commented 2 years ago

I think the reason is that you didn't have it use a Chinese voice

photkey commented 2 years ago

Specifying the Chinese voice, --parallel-batch-size 1 still fails, but I hardly ever see a retry halfway through.

DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_59.mp3
DEBUG:root:Needed -0.08100000000007412 seconds for 60
DEBUG:root:Needed 0.4659999999999229 seconds for 61
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_61.mp3...
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_61.mp3
DEBUG:root:Needed 3.431999999999931 seconds for 62
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_62.mp3...
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_62.mp3
DEBUG:root:Needed 3.2859999999999445 seconds for 63
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_63.mp3...
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmp2z33qd5h\silence_63.mp3
Traceback (most recent call last):
  File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 245, in main
    asyncio.get_event_loop().run_until_complete(
  File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 216, in _main
    raise Exception("ffmpeg failed")
Exception: ffmpeg failed

photkey commented 2 years ago

The default setting is --parallel-batch-size 100, which is almost entirely retries.

rany2 commented 2 years ago

Could you comment out stdout and stderr lines? And share output again

photkey commented 2 years ago

Is main.py commented out inside this file? There are 3 places in total. Do you comment them all out?

rany2 commented 2 years ago

Yes

photkey commented 2 years ago

Input #0, lavfi, from 'anullsrc=cl=mono:r=24000':
  Duration: N/A, start: 0.000000, bitrate: 192 kb/s
  Stream #0:0: Audio: pcm_u8, 24000 Hz, mono, u8, 192 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_u8 (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'C:\Users\tuike\AppData\Local\Temp\tmpu024g4i6\silence_62.mp3':
  Metadata:
    TSSE            : Lavf59.16.100
  Stream #0:0: Audio: mp3, 24000 Hz, mono, s16p
    Metadata:
      encoder         : Lavc59.18.100 libmp3lame
size=      14kB time=00:00:03.43 bitrate=  33.0kbits/s speed= 440x
video:0kB audio:14kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.702586%
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpu024g4i6\silence_62.mp3
DEBUG:root:Needed 3.2859999999999445 seconds for 63
DEBUG:root:Generating C:\Users\tuike\AppData\Local\Temp\tmpu024g4i6\silence_63.mp3...
ffmpeg version 5.0-full_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 11.2.0 (Rev5, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      57. 17.100 / 57. 17.100
  libavcodec     59. 18.100 / 59. 18.100
  libavformat    59. 16.100 / 59. 16.100
  libavdevice    59.  4.100 / 59.  4.100
  libavfilter     8. 24.100 /  8. 24.100
  libswscale      6.  4.100 /  6.  4.100
  libswresample   4.  3.100 /  4.  3.100
  libpostproc    56.  3.100 / 56.  3.100
Input #0, lavfi, from 'anullsrc=cl=mono:r=24000':
  Duration: N/A, start: 0.000000, bitrate: 192 kb/s
  Stream #0:0: Audio: pcm_u8, 24000 Hz, mono, u8, 192 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_u8 (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'C:\Users\tuike\AppData\Local\Temp\tmpu024g4i6\silence_63.mp3':
  Metadata:
    TSSE            : Lavf59.16.100
  Stream #0:0: Audio: mp3, 24000 Hz, mono, s16p
    Metadata:
      encoder         : Lavc59.18.100 libmp3lame
size=      13kB time=00:00:03.28 bitrate=  33.0kbits/s speed= 442x
video:0kB audio:13kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.776079%
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpu024g4i6\silence_63.mp3
ffmpeg version 5.0-full_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 11.2.0 (Rev5, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      57. 17.100 / 57. 17.100
  libavcodec     59. 18.100 / 59. 18.100
  libavformat    59. 16.100 / 59. 16.100
  libavdevice    59.  4.100 / 59.  4.100
  libavfilter     8. 24.100 /  8. 24.100
  libswscale      6.  4.100 /  6.  4.100
  libswresample   4.  3.100 /  4.  3.100
  libpostproc    56.  3.100 / 56.  3.100
C:\Users\tuike\AppData\Local\Temp\tmpoyl9iq1k: Permission denied
Traceback (most recent call last):
  File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 245, in main
    asyncio.get_event_loop().run_until_complete(
  File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 216, in _main
    raise Exception("ffmpeg failed")
Exception: ffmpeg failed

rany2 commented 2 years ago

Could you attempt it again with the new version? Please comment stdout and stderr again if it doesn't work and returns "ffmpeg failed".

photkey commented 2 years ago

Wait a minute, tried several times via pip without success, the cache hasn't been updated yet I think

photkey commented 2 years ago

Now that everything is working, I'll try again to see how much parallel-batch-size can be set at most before it's error-prone.

photkey commented 2 years ago

parallel-batch-size 150 works fine too, it's flying, nice work! I'll go test some more different SRT files.

rany2 commented 2 years ago

Very good :)

photkey commented 2 years ago

I downloaded some SRT subtitle files for the movie, probably because the sentences were too many and the error message was reported as follows.

DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpa43ftvzl\396.mp3
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpa43ftvzl\387.mp3
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpa43ftvzl\399.mp3
DEBUG:root:Generated C:\Users\tuike\AppData\Local\Temp\tmpa43ftvzl\395.mp3
Traceback (most recent call last):
  File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 986, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  File "d:\develop\python\python38\lib\asyncio\base_events.py", line 1025, in create_connection
    raise exceptions[0]
  File "d:\develop\python\python38\lib\asyncio\base_events.py", line 1010, in create_connection
    sock = await self._connect_sock(
  File "d:\develop\python\python38\lib\asyncio\base_events.py", line 924, in _connect_sock
    await self.sock_connect(sock, address)
  File "d:\develop\python\python38\lib\asyncio\proactor_events.py", line 702, in sock_connect
    return await self._proactor.connect(sock, address)
  File "d:\develop\python\python38\lib\asyncio\windows_events.py", line 812, in _poll
    value = callback(transferred, key, ov)
  File "d:\develop\python\python38\lib\asyncio\windows_events.py", line 599, in finish_connect
    ov.getresult()
OSError: [WinError 121] The signal timeout has expired

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 250, in main
    asyncio.get_event_loop().run_until_complete(
  File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 172, in _main
    await asyncio.gather(*coros[i : i + batch_size])
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 105, in audio_gen
    async for j in communicate.run(
  File "d:\develop\python\python38\lib\site-packages\edge_tts\communicate.py", line 271, in run
    async with session.ws_connect(
  File "d:\develop\python\python38\lib\site-packages\aiohttp\client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "d:\develop\python\python38\lib\site-packages\aiohttp\client.py", line 776, in _ws_connect
    resp = await self.request(
  File "d:\develop\python\python38\lib\site-packages\aiohttp\client.py", line 535, in _request
    conn = await self._connector.connect(
  File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 542, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 907, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 1206, in _create_direct_connection
    raise last_exc
  File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 1175, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
  File "d:\develop\python\python38\lib\site-packages\aiohttp\connector.py", line 992, in _wrap_create_connection
    raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host speech.platform.bing.com:443 ssl:default [The signal timeout has expired]
sys:1: RuntimeWarning: coroutine 'audio_gen' was never awaited

Reset parallel-batch-size 50 to successfully generate MP3 Just a live feedback on the errors encountered and I will continue testing.

photkey commented 2 years ago

Just had an idea during testing, discuss with you if it's necessary to implement it?

I was re-generating an MP3 file for a long narrated video and found that Microsoft's text-to-speech was so good that it could be compared to a real person (I noticed that Microsoft's file-to-speech algorithm has recently been upgraded), but the resulting MP3 was not quite as good as the speed, although the timeline was correct, and if set at a reasonable speed, Microsoft's dubbing was really better than some of the less pronounced real people. So my idea is to set a sentence and the time it takes for the human to read it aloud (the video author can read the sentence himself and record the time), and use that time to calculate the most reasonable speed of speech, because different voices of Microsoft speak at different speeds, let the selected voice read the sentence at the default speed, record the time, and then calculate the ratio with the one read by the human, so that no matter which voice is selected, the most reasonable speed of speech can be calculated automatically, closest to the speed of speech the author wants to control.

photkey commented 2 years ago

Maybe this feature should be attributed to edge-tts, either way, I think it is something that could be closer to the author's desire to control the speed, otherwise it would have to be tried over and over and over again.

rany2 commented 2 years ago

So you'd like to calculate the default rate that you want?

photkey commented 2 years ago

Yes, the default rate of the selected VOICE is calculated automatically from the text parameters and time parameters passed in. what do you think?

rany2 commented 2 years ago

I don't think it's possible

photkey commented 2 years ago

Let's say the time parameter is passed in as 10 seconds, and the selected voice actually takes 20 seconds to read the sentence aloud using the default rate; this automatically sets the rate to +100%, isn't that the speed of speech we need?

rany2 commented 2 years ago

Yes but I have no way of knowing how long the selected voice would take. Also because it is some neural network sometimes it might say some sentences faster than others. No simple formula for this :/

rany2 commented 2 years ago

I didn't explain how to use it:

1
00:00:00,596 --> 00:00:04,744
首先 我们可以用这种酒精湿巾
edge_tts{voice:zh-CN-XiaoxiaoNeural}

2
00:00:04,744 --> 00:00:08,340
Hello, my name is Bob and I love cooking
edge_tts{voice:en-US-BrandonNeural}

3
00:00:10,440 --> 00:00:20,440
I love it too!
edge_tts{voice:en-US-AshleyNeural,rate:x-fast,pitch:high}

So far only pitch, rate, volume and voice are supported. If you'd like more make another issue :)

photkey commented 2 years ago

Ok, I'll test this version first, and I have to say you are really efficient, I thought all these features would be possible a long time later.

photkey commented 2 years ago

rany2 commented 2 years ago

fixed

rany2 / edge-srt-to-speech