isaacbernat / netflix-to-srt

Rip, extract and convert subtitles to .srt closed captions from .xml/dfxp/ttml and .vtt/WebVTT (e.g. Netflix, YouTube)
MIT License
759 stars 74 forks source link

The script cuts everything after </span> #7

Closed n3tman closed 8 years ago

n3tman commented 8 years ago

Hello, thanks for a great script!

I've noticed that if the script sees lines likes this: <p begin="161831670t" end="193943750t" region="topCenter" style="s1" xml:id="subtitle2"><span style="s1_1">Terrace House</span> is a show about<br/>six strangers, men and women,</p> or <p begin="307807500t" end="363282920t" region="bottomCenter" style="s1" xml:id="subtitle7"><span style="s1_1">Terrace House</span> has now<br/>been revived by Netflix.</p>

It converts them to: 3 00:00:16,183 --> 00:00:19,394 Terrace House or 8 00:00:30,780 --> 00:00:36,328 Terrace House

So the part after </span> is stripped.

Example file: sample.xml.txt

isaacbernat commented 8 years ago

Hi, thanks for reporting the issue.

The expected output would be Terrace House is a show about six strangers, men and women, and Terrace House has now been revived by Netflix. right?

n3tman commented 8 years ago

You're right, however, I think the </br> tags should be left too. So it should be: 3 00:00:16,183 --> 00:00:19,394 Terrace House is a show about six strangers, men and women, and 8 00:00:30,780 --> 00:00:36,328 Terrace House has now been revived by Netflix.

isaacbernat commented 8 years ago

Thanks. I think I fixed it now. Can you pull the new version and try again?

n3tman commented 8 years ago

Works great :+1: Thank you!

isaacbernat commented 8 years ago

Glad to help :)