sachac / subed

subed is a subtitle editor for Emacs
177 stars 16 forks source link

Find overlapping timestamps when sanitizing #6

Open rndusr opened 5 years ago

rndusr commented 5 years ago

subed should now properly prevent overlapping subtitles while editing, but it will still load a file with illegal timestamps.

Sanitizing should report:

Should this be implemented generically for all subtitle formats?

I'm not familiar with any subtitle formats besides SubRip. If all widely used formats have the same concept of start/stop timestamps per subtitle, these checks should probably be implemented generically.

mooseyboots commented 3 years ago

this would be a really helpful feature. i just had to do this manually on a 30 min video. has anyone started on it or even scripted a little workaround perhaps?

mooseyboots commented 3 years ago

i cooked up a basic hack to implement trimming overlapping subtitle times for srt subs

(defcustom subed-trim-overlapping-subtitle-times nil
  "Whether overlapping subtitles should be adjusted on saving. A string of either 'start', to adjust the start time of the following subtitle or 'end', to adjust the end of the current subtitle. Defaults to nil."
  :type 'string
  :group 'subed)

(setq subed-trim-overlapping-subtitle-times "start")

(defun subed-srt--trim-overlap-end-times ()
  "Check if end time of current subtitle is after start point of next.

If so, trim the end time of current subtitle to 1 milisecond less than next."
  (interactive)
  (let ((next-sub-start-time (save-excursion
                               (subed-srt--forward-subtitle-time-start)
                               (subed-srt--subtitle-msecs-start))))
    (if (>= (subed-srt--subtitle-msecs-stop) next-sub-start-time)
        (subed-srt--set-subtitle-time-stop
         (1- next-sub-start-time)))))

(defun subed-srt--trim-overlap-start-times ()
  "Check if end time of current subtitle is after start point of next.

If so, trim the start time of next subtitle to 1 milisecond more than prev."
  (interactive)
  (let ((this-sub-stop-time (subed-srt--subtitle-msecs-stop))
        (next-sub-start-time (save-excursion
                               (subed-srt--forward-subtitle-time-start)
                               (subed-srt--subtitle-msecs-start))))
    (if (>= this-sub-stop-time next-sub-start-time)
        (save-excursion
          (subed-srt--forward-subtitle-time-start)
          (subed-srt--set-subtitle-time-start
           (1+ this-sub-stop-time))))))

(defun subed-srt--sanitize-overlaps ()
  "Adjust all overlapping times in current file.

Uses either `subed-srt--trim-overlap-start-times' or `subed-srt--trim-overlapping-end-times', the latter being the default. See `subed-trim-overlap-start-or-end' to customize this option."
  (interactive)
  (goto-char (point-min))
  (save-excursion
    (while (subed-srt--forward-subtitle-time-start)
      (if (equal subed-trim-overlapping-subtitle-times "start")
          (subed-srt--trim-overlap-start-times)
        (if (equal subed-trim-overlap-start-or-end "end")
            (subed-srt--trim-overlap-end-times))))))

;; add to existing subed-srt--sort:
(defun subed-srt--sort ()
  "Sanitize, then sort subtitles by start time and re-number them."
  (interactive)
  (atomic-change-group
    (subed-srt--sanitize)
    (when subed-trim-overlapping-subtitle-times
      (subed-srt--sanitize-overlaps))
    (subed-srt--validate)
    (subed-save-excursion
     (goto-char (point-min))
     (sort-subr nil
                ;; nextrecfun (move to next record/subtitle or to end-of-buffer
                ;; if there are no more records)
                (lambda () (unless (subed-srt--forward-subtitle-id)
                             (goto-char (point-max))))
                ;; endrecfun (move to end of current record/subtitle)
                #'subed-srt--jump-to-subtitle-end
                ;; startkeyfun (return sort value of current record/subtitle)
                #'subed-srt--subtitle-msecs-start))
    (subed-srt--regenerate-ids)))

a more elegant option might be to also allow custom respecting of subed-milliseconds-adjust, rather than just using a single millisecond difference. but that may also make adjustments that are larger than what is wanted. another poss would be to split the overlap in half and add to each timecode.

this could also easily be adapted to just reporting/validating rather than actually changing, or left out of subed-srt--sort to only be run interactively.

i don't know the subed code well at all, so i'm not sure if this is a very good way to do this.

rndusr commented 3 years ago

Thank you for putting in the work. I only glanced at your code, but I don't see anything fundamentally wrong with it.

I think it should fit in nicely with the existing SRT code. But we now have VTT support and trimming should work with either format. If you could use only functions from subed-common.el to implement this, it should mean all formats are trimmed.

I also think it would be better to provide this functionality as a public function that is optionally called automatically after opening a file. Some users may want overlapping subtitles. I know mpv supports this and just displays them on top of each other.

Instead of the 1 millisecond gap, you could use subed-subtitle-spacing, which is already used to prevent overlapping while adjusting start/stop time.

mooseyboots commented 3 years ago

hi @rndusr,

ah so i just edited the above to make it only run if the custom setting is non-nil, and set that as default.

santize-overlaps is already interactive, so can just be run by the user. and you are right that it could be good to make it optional to run on loading a file, so you do it before making any manual adjustments. or ask on load.

happy to make it run on both formats, that's what i didn't understand v well with yr code.

i can have another play soon and if you like i can (try to!) submit a PR.

*

my personal case re 1ms is that while i use subed-subtitle-spacing generally, the gap is not imperative for me, so it makes a kinda sense to left them w no gap unless adjusted. but cd make it an option somehow.

rndusr commented 3 years ago

i can have another play soon and if you like i can (try to!) submit a PR.

I suggest writing tests. That helped me a lot with pinning down all the edge cases.

the gap is not imperative for me, so it makes a kinda sense to left them w no gap unless adjusted. but cd make it an option somehow.

Maybe with a C-u prefix? You could even use a custom gap that way.

mooseyboots commented 3 years ago

If you could use only functions from subed-common.el to implement this, it should mean all formats are trimmed.

i took a look and i'm a little confused. did you maybe mean using the functions included in subed--generic-function-suffixes? from what i see those functions are converted from generic to specific format funs when subed--init is run, while subed-common.el doesn't seem to contain the funs this functionality would need.

because that variable contains the funs this code needs, it looks like simply reformatting everything using generic functions would work. is that correct?

i also added a custom option to use subed-subtitle-spacing rather than +/- 1 millisecond. universal-arg prefix is easy done, will look into multi prefix options also.

rndusr commented 3 years ago

You're right, of course, sorry. I haven't really worked on this project for a long time, and seem to have successfully dispelled the nightmare of abstraction from my memory.