jansenicus / vtt-to-srt.py

python script to convert all vtt files in a directory and all of its subdirectories to srt subtitle format
61 stars 33 forks source link

caption sequence in srt file . #17

Open KarimReefat opened 5 years ago

KarimReefat commented 5 years ago

1- according to other web pages the srt files should have a caption sequence before the timecode like this:


5 00:00:16,920 --> 00:00:22,470 You can think of it as the opposite to call the So-Cal

6 00:00:22,470 --> 00:00:27,750 other devices middes light a will is designed so that you hack into it.

7 00:00:27,750 --> 00:00:31,060 So it's designed for people who want to learn.


but this is not happening when i use your vtt-to-srt library.

this can be avoided when i use this library: https://github.com/lbrayner/vtt-to-srt

2- is there any problem in using the code in this library to create the vtt files he already use html2text , pysrt , webvtt-py libraries to do this ??

KarimReefat commented 4 years ago

i am sorry i still know nothing about git to use it to add this code to the repository, so here is my code to fix the problem of order numbers of the caption:


num = 0

def order_number(matchobj):

"""  this function use to replace . with , and add the number before the time. """
global num
num += 1
return '{0}\n'.format(num) + matchobj.group(0).replace('.', ',')

def convert_content(file_contents):

"""Convert convert of vtt file to str format
   Keyword arguments:
   file_contents
"""
replacement = re.sub(r"(\d\d:\d\d:\d\d).(\d\d\d) --> (\d\d:\d\d:\d\d).(\d\d\d)(?:[ \-\w]+:[\w\%\d:]+)*\n", order_number, file_contents)
replacement = re.sub(r"(\d\d:\d\d).(\d\d\d) --> (\d\d:\d\d).(\d\d\d)(?:[ \-\w]+:[\w\%\d:]+)*\n", order_number, replacement)
replacement = re.sub(r"(\d\d).(\d\d\d) --> (\d\d).(\d\d\d)(?:[ \-\w]+:[\w\%\d:]+)*\n", order_number, replacement)
replacement = re.sub(r"WEBVTT\n", "", replacement)
replacement = re.sub(r"Kind:[ \-\w]+\n", "", replacement)
replacement = re.sub(r"Language:[ \-\w]+\n", "", replacement)
replacement = re.sub(r"<c[.\w\d]*>", "", replacement)
replacement = re.sub(r"</c>", "", replacement)
replacement = re.sub(r"<\d\d:\d\d:\d\d.\d\d\d>", "", replacement)
replacement = re.sub(r"::[\-\w]+\([\-.\w\d]+\)[ ]*{[.,:;\(\) \-\w\d]+\n }\n", "", replacement)
replacement = re.sub(r"Style:\n##\n", "", replacement)
return replacement

reference: https://stackoverflow.com/questions/26678773/python-how-can-i-add-a-counter-to-the-replacement-argument-of-re-sub


correct me as much as you want.

heniotierra commented 4 years ago

Hi Karim. Thats nice! Only thing I can point out to improve your code is that the variablenum` should not be global.