isaacbernat / netflix-to-srt

Rip, extract and convert subtitles to .srt closed captions from .xml/dfxp/ttml and .vtt/WebVTT (e.g. Netflix, YouTube)
MIT License
749 stars 72 forks source link

One more bug with <span>'s #8

Closed n3tman closed 8 years ago

n3tman commented 8 years ago

This <p begin="275695420t" end="325325000t" region="bottomCenter" style="s1" xml:id="subtitle7">So as of last week,<br/><span style="s1_1">Terrace House</span> began its new season.</p> produced 8 00:00:27,569 --> 00:00:32,532 Terrace HouseSo as of last week, began its new season. but should be 8 00:00:27,569 --> 00:00:32,532 So as of last week, Terrace House began its new season.

As a temporary solution I strip all <span>'s using Notepad++ regex search: <span style="[^"]+">([^<]+)</span> replacing it with $1.

Example: sample.xml.txt

isaacbernat commented 8 years ago

Thanks for reporting and for the suggestion! I think it should be fixed now. Could you verify?

n3tman commented 8 years ago

Yep, works good. Thank you.