isaacbernat / netflix-to-srt

Rip, extract and convert subtitles to .srt closed captions from .xml/dfxp/ttml and .vtt/WebVTT (e.g. Netflix, YouTube)
MIT License
759 stars 73 forks source link

converted vtt file contains text format in srt output #41

Open darodi opened 2 years ago

darodi commented 2 years ago

converted vtt file contains text format in srt output

source file https://gist.github.com/darodi/c95ba5592933d11f8963626944ea4735#file-je-suis-la-2853734-fra-vtt

target file https://gist.github.com/darodi/c95ba5592933d11f8963626944ea4735#file-je-suis-la-2853734-fra-vtt-srt

as you can see,

1
00:00:07,120 --> 00:00:09,480 
<c.magenta.bg_black>Musique douce</c>

insead of

1
00:00:07,120 --> 00:00:09,480 
<font color=#ff00ff">Musique douce</font> 
isaacbernat commented 2 years ago

thanks for reporting. I merged the fix in this PR https://github.com/isaacbernat/netflix-to-srt/pull/42 feel free to add more properties to "font" and star the project if you like it.

darodi commented 2 years ago

Thanks for the PR.

In the meantime, I wrote a script by myself, nearly from scratch to do the conversion.

Here are the issues I had.

for example: <c.magenta.bg_black><i.blue>Some italic</i> and normal coloured text. By the way, 2 &lt; 3 !</c>

is well rendered with this output <font color="#ff00ff"><i><font color="#0000ff">Some italic</font></i> and normal coloured text. By the way, 2 < 3 ! </font>

Here is the script if you want to check the code https://github.com/darodi/vtt-to-srt/blob/master/vtt-to-srt.py