tkarabela / pysubs2

A Python library for editing subtitle files
http://pysubs2.readthedocs.io
MIT License
318 stars 40 forks source link

text retuen all the content with color and other tag in pysubs2 #53

Closed Monirzadeh closed 2 years ago

Monirzadeh commented 2 years ago

if i have subtitle with this text

1362
01:58:37,030 --> 01:58:50,030
<font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>
<font color="#0A7AA6">.: text3  :.</font>
<font color="#0A7AA6">text</font>

how should i get that exactly that from pysubs2

<font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>
<font color="#0A7AA6">.: text3  :.</font>
<font color="#0A7AA6">text</font>

if i call .text it remove all font tags color that i don't want.

tkarabela commented 2 years ago

Hi @Monirzadeh , you can use the keep_html_tags=True option when loading the SRT file (docs) to get close to what you want:

import pysubs2

input_srt = """
1362
01:58:37,030 --> 01:58:50,030
<font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>
<font color="#0A7AA6">.: text3  :.</font>
<font color="#0A7AA6">text</font>
"""

subs = pysubs2.SSAFile.from_string(input_srt, keep_html_tags=True)
print(subs[0].text)
# <font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>\N<font color="#0A7AA6">.: text3  :.</font>\N<font color="#0A7AA6">text</font>

It keeps the HTML tags, but replaces newlines "\n" with SubStation newline tags "\\N". You can replace it back if you'd like:

subs[0].text = subs[0].text.replace(r"\N", "\n")
print(subs.to_string("srt"))

which produces:

1
01:58:37,030 --> 01:58:50,030
<font color="#666666">TV</font><font color="#ad0303">S</font><font color="9f9f9f">text1</font>.Com</font>
<font color="#0A7AA6">.: text3  :.</font>
<font color="#0A7AA6">text</font>

I haven't really thought about that newline behaviour, I could add keep_newlines=True option if you think that makes sense :)

Monirzadeh commented 2 years ago
subs[0].text = subs[0].text.replace(r"\N", "\n")

thanks