otsaloma / gaupol

Editor for text-based subtitle files
https://otsaloma.io/gaupol/
GNU General Public License v3.0
247 stars 35 forks source link

SubRip format regression: last subtitle text skipped #217

Closed gnanet closed 3 months ago

gnanet commented 4 months ago

Since At least by the time of commit f144502 the SRT format is written wrong, causing some players to skip the last subtitle item.

The problem lies in misunderstanding the "one subtitle item": the empty line is an integrated piece of a subtitle-item, thus it has to be written even after the last subtitle item.

The SRT format line-by-line (https://en.wikipedia.org/wiki/SubRip)

  1. A numeric counter identifying each sequential subtitle
  2. The time that the subtitle should appear on the screen, followed by --> and the time it should disappear
  3. Subtitle text itself on one or more lines
  4. A blank line containing no text, indicating the end of this subtitle

Simply said: the SRT must end with an empty line.

The SRT generated with latest version gaupol/aeidon:

1
00:00:00,000 --> 00:00:03,000
first

2
00:00:03,000 --> 00:00:06,000
second

3
00:00:06,000 --> 00:00:09,000
third

4
00:00:09,000 --> 00:00:12,000
last sub is skipped

This SRT was manually corrected, to contain the missing empty line:

1
00:00:00,000 --> 00:00:03,000
first

2
00:00:03,000 --> 00:00:06,000
second

3
00:00:06,000 --> 00:00:09,000
third

4
00:00:09,000 --> 00:00:12,000
last sub is show
otsaloma commented 3 months ago

Well, it's not really wrong. There is no spec. And it's not Matroska's format, not up to them. Also, interestingly, the Wikipedia article says

4. A blank line containing no text, indicating the end of this subtitle

but Matroska wiki has changed to

4. A blank line indicating the start of a new subtitle.

https://www.matroska.org/technical/subtitles.html#srt-subtitles

I find extra blank lines (beyond the conventional final newline) silly, especially since many people like myself have configured text editors to strip extra trailing spaces to avoid, among other things, excessive git diffs. So, it's easy to lose that extra.

But if you have observed some players require this silly extra, I think it's fine to add. We can be accomodating.

gnanet commented 3 months ago

Honestly, i have not followed the link from wikipedia over to matroska's text, but i see their intent. Even WEBVTT is already describe this, and its also a logical not to require a blank line at EOF. But as i saw the difference to the SRT generated by https://github.com/Softcatala/whisper-ctranslate2 and the edited one saved by aeidon, and found the description on wikipedia, i accepted the fact that an obscure requirement could drop the last subtitle. At the time i was preparing the PR i was wondering how this was not an issue for other users.

If there will be reports in the future about problems with the "new" srt, i can live with reverting.

I think, what you started as a finger-practice, grown to a very useful project, the code was easy to read for me as a non programmer. There was no issue i could not work around somehow, and all the basic needs are fulfilled for fine tune auto-transcripted subtitles and then do a translation. Everything is intuitive and comfortable.

I am closing this issue.