rany2 / edge-tts

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
https://pypi.org/project/edge-tts/
GNU General Public License v3.0
5.29k stars 543 forks source link

Request support for subtitle files in SRT format #7

Closed photkey closed 2 years ago

photkey commented 2 years ago

Similar applications. https://voicenotebook.com/srtspeaker.php (Google's text-to-speech is terrible) https://github.com/bdleavitt/azure-text-to-speech-for-dubbing (might have worked well, but didn't run, no reply from the author)

It would be a lot easier for non-verbal programmers to record some videos for screen recording.I once recorded a 13-minute video and the voiceover consumed four whole days and it was a nightmare.Because it would always say the wrong thing and start again ...... I know this project is already great and supports ssml, but editing ssml is also more time consuming, editing srt can be done very quickly with the help of other subtitle software.

Possible problem: Let's say a sentence in SRT has a timeline of 6 seconds, but using Microsoft Edge's text-to-speech service it takes 8 seconds to actually play the sentence; in this case, you need to automatically adjust the speech speed of the sentence.

rany2 commented 2 years ago

Interesting idea, I'll give it a shot. Either I'll make a separate program that uses this as a library or add it to the already bundled utility. No idea when I'll manage to do this though

photkey commented 2 years ago

Very much looking forward to it! I've spent days scouring GitHub without finding an ideal project, and so far this project of yours is the best open source project I could find. very much looking forward to your new work! Wait for the good news!

rany2 commented 2 years ago

Good news, I've added a subrip to mp3 generator in the examples directory.

This is how to use it:

$ python3 ./examples/02_subrip_to_mp3.py test.srt en-US-SaraNeural test.mp3
$ ls -lh test.mp3
-rw------- 1 user user 1.6M Mar 10 12:37 test.mp3

If it works fine for you, let me know so I could close this issue

rany2 commented 2 years ago

I realized it is currently slow when dealing multiple hour long SRT. I'll make it a bit faster so it's more usable when you're dealing with that scenario..

photkey commented 2 years ago

You're fantastic, the development speed is amazing, I was expecting a long wait ...... I'll try it out right now and get back to you after testing.

photkey commented 2 years ago

There are codec issues (srt files are utf-8 encoded) and it doesn't work. Error message: UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 49: illegal multibyte sequence

Snipaste_2022-03-10_20-37-11

photkey commented 2 years ago

Another suggestion, would it be a little better to read the SRT file and use either of the two SRT libraries below? https://github.com/cdown/srt https://github.com/byroot/pysrt

rany2 commented 2 years ago

Could you try now?

photkey commented 2 years ago

Could you try now?

image

rany2 commented 2 years ago

I've very rarely used Windows so this seems weird especially because you're admin. Do you have any pointers?

Maybe running this without admin could help?

rany2 commented 2 years ago

What does the Chinese say?

photkey commented 2 years ago

What does the Chinese say?

PermissionError: [WinError 32] Another program is using this file and the process cannot access it. : 'C:\Users\tuike\AppData\Local\Temp\tmpfo2xxsew.mp3'

rany2 commented 2 years ago

Does it work properly now?

photkey commented 2 years ago

Now no errors are reported and it is able to run through, but it does not generate the MP3 file test.mp3. image

I ran it again and it reported an error near the end. image In Chinese, it means: the signal timeout has been

rany2 commented 2 years ago

Do ffmpeg and ffprobe commands work for you?

rany2 commented 2 years ago

Anyway could you try it again? Maybe on a smaller SRT so it doesn't take too much of your time..

photkey commented 2 years ago

Yes, FFmpeg I installed via scoop. image

photkey commented 2 years ago

The latest one works fine and generates a playable MP3 file successfully, FFmpeg seems to have some error messages. image

There is one other issue that has a greater impact on effectiveness. When the time in SRT is less than the text-to-speech time in Microsoft Edge, the speed is automatically adjusted to speed up playback, and this is correct. When the time in SRT is more than the text-to-speech time in Microsoft Edge, the speed should not be adjusted, and a default speech speed parameter should be used that automatically speeds up only when one of the above situations occurs.

rany2 commented 2 years ago

When the time in SRT is more than the text-to-speech time in Microsoft Edge, the speed should not be adjusted, and a default speech speed parameter should be used that automatically speeds up only when one of the above situations occurs.

Essentially you don't want the TTS to be sped up to match the SRT?

photkey commented 2 years ago

Yes, because this type of dubbing is mainly used for recorded videos, where you don't need to lip-sync, just finish the words in the corresponding timeline, and it would be weird if the speed of speech is always fast and slow, and it's normal to add subtitles that take longer than the time of speaking. The times when you need to speed up your speech are, shall we say, compelling; get it wrong here and you can't get the timing right later, so you need to speed it up.

photkey commented 2 years ago

To summarize.

  1. The default speed of speech can be set.
  2. There needs to be automatic speed of speech adjustment, but only acceleration, not deceleration.
rany2 commented 2 years ago

Is this what you meant? It now uses argparse as well

photkey commented 2 years ago

Yes, everything is fine and perfect now.

rany2 commented 2 years ago

That's very good! I'll bundle it as some kind of extra utility like edge-playback in the future

photkey commented 2 years ago

It's a bit of a shame that this is only used as an example of edge-tts, similar paid apps are all the rage in China, even tutorials related to reading text aloud through a Microsoft Edg browser and then recording it with audio recording software are pretty hot.

rany2 commented 2 years ago

Are you suggesting this "example" be a web app or just CLI utility like edge-tts and edge-playback?

photkey commented 2 years ago

Either as a web application or CLI utility, I think this it should stand alone and promote it properly, it can easily catch fire because various short video platforms are so popular nowadays, self-publishers need it, many software authors need it too.

Obviously, as a web application, there will be more users (including programmers and ordinary self-publishers) because of the low threshold of getting started; as a CLI utility, it will naturally be more popular with the programmer community.

rany2 commented 2 years ago

Now standalone, will add a web interface and README to it: https://github.com/rany2/edge-srt-to-speech

You could install with pip install edge-srt-to-speech