tkarabela / pysubs2

A Python library for editing subtitle files
http://pysubs2.readthedocs.io
MIT License
318 stars 40 forks source link

Python inline convert sup to srt #72

Closed MattGeale closed 1 year ago

MattGeale commented 1 year ago

Hey all,

Is it possible to convert a sup file to srt using pysubs2?

I have a python script that currently extracts subtitles from an mkv file based on specifications I've outlined, but now I'm wanting to convert the file from .sup to .srt. Unfortunately, I can't seem to figure out a way to do it.

This is the pastebin of the convert portion of my script, .

This was my original attempt at using SubtitleEdit to convert the sup to srt, but was running into problem after problem as it, to my annoyance after spending a solid day trying to figure out what I was doing wrong, appears that SubtitleEdit doesn't have a python API so it was never going to work!

If anyone has any examples using pysubs2 where this type of converstion can occur, that'd be unreal.

Thanks in advance!

tkarabela commented 1 year ago

Hi @MattGeale, I suggest you look into SubtitleEdit CLI: https://www.nikse.dk/subtitleedit/help#commandline, from a brief look it should be able to do what you want:

import subprocess
cmd = [
    r"D:\Program Files\Subtitle Edit\SubtitleEdit.exe",
    "/convert",
    "input.sup",
    "subrip"
]
subprocess.check_call(cmd)

If I remember correctly, SUP is an image-based format used in DVD video, so converting to SRT involves not only reading the SUP binary format, but doing OCR as well. Implementing all this from scratch is out of the scope of this library IMHO - it's best to do the image-to-text step elsewhere and then pysubs2 can be used to modify the subtitles further once they're in text form.

If you suceed with the SubtitleEdit conversion from Python, please let me know, I'm happy to include this as a tip in the documentation :)

MattGeale commented 1 year ago

Hey, thanks for this! Don't know how I missed it (╯°□°)╯︵ ┻━┻

I'll try a few combinations to see if I can get it to behave but just based on copying and pasting your above, less errors = progress! My biggest concern is how long it takes to do the OCR conversion so I don't know if Subtitle Edit is getting caught on something - when I exported the sup file and tried to do OCR manually (i.e. opening the .exe directly), it kept getting caught on words that the dictionary thought were typos so I'd have to manually intervene

MattGeale commented 1 year ago

Hey @tkarabela,

I got the script to run and convert the subtitle on the fly! In case this is of any interest, and/or you want to put it in your own documentation, and/or anyone comes up on this thread, this is what I used to get it to convert mixed with my original sup extract (note it prioritises SDH tagged subtitles but will fallback to english if no SDH is detected)

https://pastebin.com/i00UQG1V