shaka-project / shaka-packager

A media packaging and development framework for VOD and Live DASH and HLS applications, supporting Common Encryption for Widevine and other DRM Systems.
https://shaka-project.github.io/shaka-packager/
Other
1.95k stars 505 forks source link

WebVTT: END_OF_STREAM error when there is no, or only one, cue #1018

Closed dvoracek-slub closed 2 years ago

dvoracek-slub commented 2 years ago

System info

Operating System: any Shaka Packager Version: e1b0c7c (current master)

Issue and steps to reproduce the problem

When processing a WebVTT file that either does not contain a cue, or ends immediately after the first cue, Packager rejects it with an "END_OF_STREAM" error. The sample files are taken from the parser unit tests.

Packager Command:

packager in=sample1.vtt,stream=text,output=sample1_processed.vtt

What is the expected result? The files should be processed without error, and output files should be generated as appropriate. (When there is no cue, it may be viable to not generate any output. That, at least, seems to be the current behavior when there are only zero-length cues.)

What happens instead? I receive the following message:

[0109/183224:INFO:demuxer.cc(89)] Demuxer::Run() on file 'sample1.vtt'.
[0109/183224:INFO:demuxer.cc(155)] Initialize Demuxer for file 'sample1.vtt'.
[0109/183224:ERROR:packager_main.cc(554)] Packaging Error: 6 (END_OF_STREAM): 

From what I understood in the code, the issue seems to lie in the interaction between Demuxer and WebVttParser. When the end of the input file is reached before Demuxer::ParserInitEvent has been called (either because there is no cue, or because the parser doesn't know yet that the cue is finished), this is incorrectly treated as an error in Demuxer::Run.

Sample Files

sample1.vtt

WEBVTT

sample2.vtt (NOTE: single newline at end of file)

WEBVTT

00:01:00.000 --> 01:00:00.000
subtitle
vish91 commented 2 years ago

@kqyang @joeyparrish any thoughts on this ? where in the packager is it exiting out ? I was able to reproduce this today with another sample. I had a simple input webvtt like

WEBVTT

00:03:34.882 --> 00:03:37.384 align:center
AMBASSADE et CONSULAT
de l'ÉTAT ARABE du QADIR

And with just this small WEBVTT file it just exits out instead of processing it correctly as a small VTT file with a single cue.

[0518/001023:ERROR:packager_main.cc(554)] Packaging Error: 6 (END_OF_STREAM):
joeyparrish commented 2 years ago

Good question. I wonder if we have a really basic edge case bug. If you add an empty line or two to the end of the file, does that make a difference?

dvoracek-slub commented 2 years ago

@joeyparrish Yes, it does make a difference in the case of a single cue (sample2). Processing works when there are two newlines. (It doesn't seem to make a difference in the header-only case.)

joeyparrish commented 2 years ago

I would bet that our VTT parser is just too strict about the end of the input string. See if you can modify packager/media/formats/webvtt/webvtt_parser.cc and add a new test case to cover this. We would love to have your contribution.

vish91 commented 2 years ago

Try this PR @joeyparrish @dvoracek-slub
Its difficult to write a test case from the structure where this can be tested, so will need you to tell me that. webvtt_praser_unittest basically doesn't check the body like we would think so it doesn't check for newlines. webvtt_muxer probably should but didn't follow what is being asserted there. I did went ahead and ran all existing tests and ran packager on my sample with single cue and works there.