obsproject / obs-studio

OBS Studio - Free and open source software for live streaming and screen recording
https://obsproject.com
GNU General Public License v2.0
58.51k stars 7.81k forks source link

CEA compliance bugs #4006

Open Niko78 opened 3 years ago

Niko78 commented 3 years ago

hi, I'm using close caption plugin in obs which passes speech recognition to obs. It seems it is not totaly CEA 608 compliance. I created a bug entry in plugin but they forwarded me to here cause it's in obs => https://github.com/ratwithacompiler/OBS-captions-plugin/issues/51

I had opened a support ticket in wowza cause first lang was not transported while I think that could be done and second result was not ok. They asked me to test a file with cea 608 which works well, so they think OBS maybe doesn't respect cea 608 norm.

I use last OBS (windows version 26.1) et last plugin

Thanks Nico

RytoEX commented 3 years ago

Please follow the issue template when opening issues on GitHub.

Niko78 commented 3 years ago

sure sorry so see below message from wowza support and file given + team plugin reply

"Based on your results (all good with the cea608 sample file and the issue happening only when generating/sending the cea608 captions from the OBS plugin), everything points to the plugin not generating fully compliant cea608 captions. Also note that it's in beta, thus likely to be not fully tested. At this stage, I'd recommend reaching out to the plugin developers for their review, since the plugin's output should be in exactly the same format and compliance level as the cea608 sample file I've provided, which doesn't seem the case at the moment."

plugin team answered in pointing that "the actual CEA 608 embedding of the captions into the stream is done by OBS itself via libcaption, my plugin just handles the speech recognition API and utility stuff like transcripts and then basically just passes the caption text on to OBS for embedding. Any changes like fixing CEA compliance bugs or setting a language would have to go into OBS and libcaption itself and are out of scope here."

https://user-images.githubusercontent.com/17108017/103475530-1b252700-4dae-11eb-8941-4ff66ee3cf65.mp4

Thanks

DDRBoxman commented 3 years ago

Looks like the first step would be adding support for generating the Caption Service Descriptor in the CEA-708 CDP in libcaption.

DDRBoxman commented 3 years ago

I guess a good question is also if wowza actually expects the Caption Service Descriptor to label the caption data or if they expect it in another format

Niko78 commented 3 years ago

I forwarded your requests to wowza support but they don't reply outside support ticket so they answered me that

... the structure of the sample CEA608 file I provided can be inspected via an ffprobe command such as ffprobe cea608-sync_with-subtitles.mp4:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'cea608-sync_with-subtitles.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf57.23.101 Duration: 00:00:42.00, start: 0.000000, bitrate: 184 kb/s Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 50 kb/s, 59.94 fps, 59.94 tbr, 1000k tbn, 119.88 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(eng): Audio: mp3 (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default) Metadata: handler_name : SoundHandler Stream #0:2(eng): Subtitle: mov_text (tx3g / 0x67337874), 0 kb/s (default) Metadata: handler_name : SubtitleHandler​

Any input such as the above should work fine. If you need more info you can ask question in wowza forum if needed.

tvadi commented 2 years ago

Hi, looking to see what would be the next steps in getting good (CEA compliant) captions out of OBS with the Cloud captions plug in. I can stream to MPV and captions show but not sure they work anywhere else. I have ffmpeg built with libklvanc and can load the SRT stream but it continually gives: Illegal cc_count received error Is there any way to fix this and get good CEA complaint captions from OBS?

tvadi commented 2 years ago

https://github.com/stoth68000/libklvanc/issues/68 From one of the libklvanc guys, dheitmueller, says: "it is the responsibility of the upstream source to properly send the correct number of tuples, based on the framerate of the underlying video. If you're seeing "illegal cc_count" errors then it's likely that the original upstream source went through some sort of framerate conversion but didn't repack the stream.

For example, with 720p59 video the cc_count must be 10, and with 1080i59 video the cc_count must be 20. If the video had an original framerate of 720p59 and was framerate-converted to 1080i59, the each frame of the resulting video should contain 20 cc tuples.

Within ffmpeg I accomplish this with the introduction of a FIFO which repacks the series of tuples with each frame based on the target framerate, and it's hooked in to each filter which changes the framerate.

https://github.com/LTNGlobal-opensource/FFmpeg-ltn/blob/lted1/libavutil/cc_fifo.c

I also have a filter which does the repacking if the cc_count is simply wrong and the data is intact. However it's worth noting that this doesn't help if some prior filter discarded half the caption data or duplicated it (as would be the common outcome from framerate changes). If you see double characters or every other character is missing, this filter won't help.

https://github.com/LTNGlobal-opensource/FFmpeg-ltn/blob/lted1/libavfilter/vf_ccrepack.c"

Would it be possible to implement FIFO and vf_ccrepack into OBS as dheitmueller is saying so we could get good captions out OBS? Thanks!

tvadi commented 2 years ago

Copied from emails looking into this problem if anyone is interested.. Thanks to Devin at libklvanc...

1- With a 50 second SRT raw TS file made directly from the OBS output- Only 105 out of the 2802 frames of video contained caption data. For 720p59, every frame is supposed to contain 10 tuples of data (i.e. cc_count of 10 translating to 30 bytes of data). 2- In the frames that did contain caption data, the cc_count was a very large value (on the order of 53), and there appears to have been no effort to pack only one 608 tuple per packet. 3- There appears to be no A53 padding bytes in the stream at all. He further estimates that the implementation is not doing any rate control. The result is the caption writer receives a sentence worth of text, generates a series of 608 pairs, and then just inserts the entire series into the next available frame. Where the correct behavior would be to queue out the 608 pairs one per frame, adding padding as needed to reach the expected number of bytes per frame.

To summarize the spec- for 720p59 video you should have 30 bytes of data per frame (i.e. a cc_count of 10), and each frame should contain exactly one 608 tuple (alternating between CC1/CC3 and CC2/CC4 on each frame). For 1080i59 it's 60 bytes per frame (cc_count=20), and each frame should contain exactly two 608 tuples (i.e. each frame carries both the CC1/CC3 tuple and the CC2/CC4 tuple).

The MPV cc parser is pretty forgiving, as it just extracts the bytes as they arrive and feed them to be rendered during playback. But both VLC and ffmpeg fail to detect the presence of captions at all (probably because those apps properly expect them to be on every frame and they aren't). And definitely any broadcast quality hardware decoder is going to not play this content.

Here is more info on the cc count and padding, pretty interesting.... The cc_count field dictates the number of three-byte tuples that are found in the frame. The framerate determines the appropriate cc_count (i.e. 20 for 29.97 and 10 for 59.94 FPS).

That said, it's not permitted to use all of the tuples within a given frame for CEA-608 packets. The standard only permits you to insert a fixed number per frame (2 CEA-608 tuples per frame for 29.97 and 1 CEA-608 tuple per frame for 59.94). The rest of the space is reserved for CEA-708, or for padding tuples if needed. Given you don't have any CEA-708 caption data, you should expect the following:

For 59.94 FPS, each frame should contain one CEA-608 tuple and nine padding tuples. For 29.97 FPS, each frame should contain two CEA-608 tuple and eighteen padding tuples.

Here's a quick example that I dumped out of a 720p59 broadcast feed I have here:

fc,80,80,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00

fd,80,80,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00

fc,ce,cb,ff,03,22,fe,4c,45,fe,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00

fd,4a,45,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00

fc,54,c8,fd,80,80,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00

Note that it alternates between 0xfc and 0xfd with each frame (0xfc is CC1/CC3, 0xfd is CC2/CC4), and that there are never more than a single 608 packet within a given frame. Also note that the cc_count is always 10 even if there is no caption data to be rendered at a given moment. Padding was inserted as needed to ensure the cc_count is constant (those are the 0xfa,0x00,0x00 tuples). Seeing the example he give, seems each line is a frame, each frame must have even 10 tuples (fd,80,80), 9 being padding tuples (fa,00,00). So with this line it seems there is 7 padding and 1x CEA-608 tuples and 2x CEA-708.. Only the fc and fd are 608 captions data and the other 2 are 708 data.. fc,ce,cb,ff,03,22,fe,4c,45,fe,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00,fa,00,00 Sounds like libcaption is working, it is on OBS.. Here is what he says on that.. I think libcaption is doing it's job - given a string of text it's producing a series of CEA-608 byte pairs that are the converted output. It's the responsibility of OBS to insert those byte pairs into the MPEG-TS stream at the proper rate and do any padding necessary.

dheitmueller commented 2 years ago

Just re-reading what I sent Matt, I noticed an error I made:

The following sentence: For 29.97 FPS, each frame should contain two CEA-608 tuple and nineteen padding tuples.

should be: For 29.97 FPS, each frame should contain two CEA-608 tuple and eighteen padding tuples.

tvadi commented 2 years ago

Edited, sorry to quote directly but if I tried to explain I think it would have not been as clear. Thanks so much for your help here, really hoping we can get this fixed in OBS!