rany2 / edge-tts

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
https://pypi.org/project/edge-tts/
GNU General Public License v3.0
5.38k stars 549 forks source link

[Feature Request] SubMaker's gen_subs should merge cues at a user configurable amount #64

Closed LiMinghuaLiGan closed 1 year ago

LiMinghuaLiGan commented 1 year ago

hello, thanks for your code very much, I have used your package version 5.1.3 before, now it is updated so I change my code, but I meet a problem when I want to use this to generate a subtitle.

original text is: 自去年维拉版本上线以来,即将达到一年之期,终于维拉的剧情也要迎来终结,这次pv里疑问最大的无疑就是,蕾比利亚最后出现的遗迹,早在这次pv没出之前,就已经有人找到了海底的类似遗迹,很多人也都通过组队去探索过了,但很遗憾的告诉你们,pv里出现的遗迹并不是这里,尽管两处的遗迹用的材质相同,设计也极为相似,甚至台子都一样,但这确实不是一个地方,注意看中间的石柱台子,这是一个四方形的台子,在侧面刻着菱形的图案,里面纹着的是古代文明的徽记,我们可以放个特写看一下,这种壁刻在幻塔里并不少见

the vtt file is:WEBVTT

00:00:00.100 --> 00:00:00.375
自

00:00:00.375 --> 00:00:00.800
去年

00:00:00.825 --> 00:00:01.087
维拉

00:00:01.100 --> 00:00:01.387
版本

00:00:01.413 --> 00:00:01.688
上线

00:00:01.688 --> 00:00:02.062
以来

where is my fault, can this be avoid? my code is

import asyncio
import edge_tts
# from edge_tts.communicate import split_text_by_byte_length, calc_max_mesg_size
TEXT = None
VOICE = "zh-CN-YunxiNeural"
OUTPUT_FILE = "e.mp3"
WEBVTT_FILE = "f.vtt"
async def _main() -> None:
    with open('data.txt',"r", encoding='UTF-8') as f:
        TEXT=f.read()
        communicate = edge_tts.Communicate(TEXT, VOICE)
        submaker = edge_tts.SubMaker()
        with open(OUTPUT_FILE, "wb") as file:
            async for chunk in communicate.stream():
                if chunk["type"] == "audio":
                    file.write(chunk["data"])
                elif chunk["type"] == "WordBoundary":
                    submaker.create_sub((chunk["offset"], chunk["duration"]), chunk["text"])
        with open(WEBVTT_FILE, "w", encoding="utf-8") as file:
            file.write(submaker.generate_subs())
if __name__ == "__main__":
    asyncio.get_event_loop().run_until_complete(_main())

I test the split_text_by_byte_length's output, the enumerate of it only have one term, is this the problem? by the way, the dub seems to be acceptable.

rany2 commented 1 year ago

I test the split_text_by_byte_length's output, the enumerate of it only have one term, is this the problem?

I think split_text_by_byte_length should be fine, the text isn't long enough to require any splitting.


Just to get this straight, there is no issue for English text just Chinese? Edit: not sure what you mean by "thinly separated"

LiMinghuaLiGan commented 1 year ago

what the subtitle I want is like this: 1 00:00:00,000 --> 00:00:02,027 自去年维拉版本上线以来

2 00:00:02,027 --> 00:00:03,502 即将达到一年之期

3 00:00:03,502 --> 00:00:05,898 终于维拉的剧情也要迎来终结

4 00:00:05,898 --> 00:00:10,506 这次PV里疑问最大的无疑就是蕾比莉亚最后出现的遗迹

5 00:00:10,506 --> 00:00:12,350 早在这次PV没出之前

I mean, I want each subtitle node to be a sentence of the original text separated by comma rather than being broken down into words or phrases like this vtt

rany2 commented 1 year ago

Ah yes, this is sort of feasible but the issue is that SentenceBoundary is no longer supported by Microsoft. You will have to somehow merge the words into a sentence

rany2 commented 1 year ago

It should be feasible to (at the generate_subs) stage have it merge it into a phrase.. maybe combine it 4 words at a time or something of the sort (user customizable, hopefully)

LiMinghuaLiGan commented 1 year ago

for English situation, it is similar, the original text is :The most questionable thing about this POV is undoubtedly the relics that finally appeared in Rebelia, as early as before this POV was released,

WEBVTT

00:00:00.100 --> 00:00:00.263 The

00:00:00.275 --> 00:00:00.575 most

00:00:00.588 --> 00:00:01.137 questionable

00:00:01.150 --> 00:00:01.413 thing

00:00:01.425 --> 00:00:01.675 about

00:00:01.688 --> 00:00:01.850 this

00:00:01.863 --> 00:00:02.400 POV

00:00:02.438 --> 00:00:02.562 is

00:00:02.575 --> 00:00:03.200 undoubtedly

00:00:03.212 --> 00:00:03.337 the

00:00:03.350 --> 00:00:03.775

rany2 commented 1 year ago

Yes, this is purposeful. It's called WordBoundary after all. This is a feature request to have SubMaker's gen_subs merge a couple of these cues into one. It should be easy to do!

LiMinghuaLiGan commented 1 year ago

thank you! I understand it.

rany2 commented 1 year ago

Can you test it? Check the commit for details on how to customize the number of words in cue (default is 10) pip install git+https://github.com/rany2/edge-tts.git

rany2 commented 1 year ago

If it works fine for you I will make a release

LiMinghuaLiGan commented 1 year ago

ok, I will have a try

LiMinghuaLiGan commented 1 year ago

for submaker English situation it works like this

WEBVTT

00:00:00.100 --> 00:00:03.337 The most questionable thing about this POV is undoubtedly the

00:00:03.350 --> 00:00:06.775 relics that finally appeared in Rebelia as early as before

00:00:06.787 --> 00:00:10.637 this POV was released someone had already found similar relics

00:00:10.688 --> 00:00:13.137 at the bottom of the sea and many people had

00:00:13.150 --> 00:00:15.963 also explored them by teaming up but I am sorry

while Chinese situation it is like this: WEBVTT

00:00:00.100 --> 00:00:03.275 自 去年 维拉 版本 上线 以来 即将 达到 一年 之

00:00:03.275 --> 00:00:06.575 期 终于 维拉 的 剧情 也 要 迎来 终结 这

00:00:06.575 --> 00:00:09.088 次 pv 里 疑问 最大 的 无疑 就 是 蕾

00:00:09.088 --> 00:00:11.675 比利亚 最后 出现 的 遗迹 早 在 这 次 pv

00:00:11.688 --> 00:00:14.025 没 出 之前 就 已经 有人 找到 了 海底 的

00:00:14.050 --> 00:00:17.012 类似 遗迹 很多 人 也 都 通过 组队 去 探索

is this you want the program to do?

rany2 commented 1 year ago

Isn't this what you wanted? (edit: this was what I intended it to do)

rany2 commented 1 year ago

Does it work improperly in Chinese? In English it seems fine (I don't speak Chinese so I can't tell)

LiMinghuaLiGan commented 1 year ago

to be honest, it work improperly in Chinese, but it doesn't matter , because in fact I don't use this submaker function much, I just try it today and find some problem, then I report it ). and I think these subtitle editing work for different kind of language can be done by python code individually rather than integrate this function into this edge tts. if the tts funciton is well, I think it is enough.

LiMinghuaLiGan commented 1 year ago

for English, I also think its output does not fit the sentence pattern, but as I mentioned above, it is not so urgent problem for this python package text to speech by edge .

rany2 commented 1 year ago

Current solution should be good enough for the time being, unfortunately there is no other alternative to this.