mantas-done / subtitles

Subtitle/caption converter
https://gotranscript.com/subtitle-converter
MIT License
142 stars 48 forks source link

fix: remove fixLine (keep original content) #55

Closed kocoten1992 closed 1 year ago

kocoten1992 commented 1 year ago

Hi @mantas-done, thanks for this repo!

I have a few questions about fixLine.

Some of use make use heavily of internal format and custom html tag (https://www.w3.org/TR/webvtt1/ and maybe even some application specific tag), but currently fixLine will alter subtitle content before us have any chance to interact with it.

And the second point is fixLine change content in undesireable way, for example https://github.com/mantas-done/subtitles/blob/master/tests/files/vtt_with_name.vtt:

WEBVTT

00:00:09.000 --> 00:00:11.000
<v Roger Bingham>We are in New York City

Some of use do write custom html displayer for subtitle on browser. That line maybe format into:

Roger Bingham We are in New York City

Roger Bingham: We are in New York City

(Roger Bingham) We are in New York City

Roger Bingham - We are in New York City

So changing content for user is undesireable.

I have a proposal to remove fixLine and let user deal with it OR maybe move fixLine to when generate content from internal format (by give us a change to interact with it first) - but I think let user deal with their own format would be the best way?

mantas-done commented 1 year ago

Hi @kocoten1992, thank you for writing. Can you explain a bit more what is your use case for this library and for what internal format are you using? My main usage is just converting between different formats, so I would like to know what other people are using it for and what would be useful to them :)

kocoten1992 commented 1 year ago

I'm using it as an ultimate subtitle manipulator for vtt x)), for example, this is the content of subengine based on your repo:

https://gist.github.com/kocoten1992/60aa9f7945f7904815c5b595f0bde8e5#file-subengine-php-L107

(Maybe some of it would be useful and could be merge back into main repo?)

The main issue here is that - I'm making use heavily of internal format, possibly other will too, changing content for user might not be ideal :smile:

P/s: I'll expand my answer a bit, I'm also make use of your repo to perfectly sync between two different natural language subtitle (for example: English and Japan), by leverage your $internal_format, in the past I was using statistic to roughly match cues together, nowaday with machine learning, I could feed it your cue between languages and it would give me more accurate answer (https://huggingface.co/sentence-transformers/all-mpnet-base-v2).

kocoten1992 commented 1 year ago

Hi @mantas-done, if the general direction is a simple conversion tool between multiple format, I would agree with that direction though (probably I'll copy core function of subtitles into application core). Please let me know :smiley:

mantas-done commented 1 year ago

Actually, it is a good suggestion to have a colon for the speaker (Roger Bingham: We are in New York City). Added it. You are correct, the main current goal is to keep things simple and to support conversion between different formats :)