laracasts / transcriptions

Load and parse VTT files.
60 stars 4 forks source link

If no timestamp validates, constructor builds a single line with an empty timestamp #5

Open DiegoPino opened 7 months ago

DiegoPino commented 7 months ago

What?

We have seen in the wild date stamps formed like this (notice the hour part, single digit). e.g from here http://html5videoguide.net/demos/google_io/3_navigation

Browsers do load them correctly even if according to the W3C specs their cue/times are incorrect (specs from https://w3c.github.io/webvtt/#webvtt-cue)

0:00:00.800 --> 0:00:02.933

In a single VTT, if all timestamps do not validate, the normalize lines method ends generating one line() object with the whole input text together and an empty timestamp, making timestamp methods (end()/start() in specific) fail badly.

the expected behavior would be

DiegoPino commented 7 months ago

This piece of the logic is not correct

 foreach ($lines as $index => $line) {
            if (TimestampSpan::validate($line)) {
                $lastMatch = $index;
                $results[$lastMatch] = $line;
            } else {
                $results[$lastMatch + 1] ??= '';
                $results[$lastMatch + 1] .= " {$line}";
            }
        }

Because $lastMatch is initialized to 0 even without any timestamp match there will be a $results[1] with the content of everything I see a few options, the specs say there needs to be space between cues, so instead of removing/filtering out spaces upfront one could take the spaces as dividers in account, but maybe the easiest option would be to only allow the else to run IF there was an actual previous match, not the default set initially

mikethea1 commented 2 months ago

Will this be fixed?