wader / fq

jq for binary formats - tool, language and decoders for working with binary and text formats
Other
9.77k stars 226 forks source link

[FORMAT] Parse ITU-T T.35 data & SCTE-128 DTVCC Captions in H.264 #533

Closed bbgdzxng1 closed 1 year ago

bbgdzxng1 commented 1 year ago

Firstly can I state that fq absolutely rocks for the analysis of media files, and is the best parser for H.264 headers.

There are a very limited number of open source tools that go to the effort of allowing H.264 SEI headers to be easily decoded. On behalf of your user-base, many thanks for all your work!

In fact, I like fq so much, I would like to ask for the AVC & SEI parser to be enhanced so that it can decode DTVCC / SCTE-128 closed caption headers in order to index the tracks contained within. My interest is specifically H.264 files, but a generic parser could also service other /formats/mpeg/* such as H.262 picture user data and H.265 SEI data.

I appreciate that fq started with a heavy focus on media files, and thus I hope this request would be interpreted as being on topic.

At present, fq successfully decodes as far as detecting "user_data_registered_itu_t_t35" in the SEI. This is cool, and beyond that of some basic parsers.

SCTE 128 Closed Captioning

Closed captioning is important for audiences with accessibility requirements to be able to enjoy and consume content with equal access as to those without accessibility needs. Video engineers need to be able to debug closed captioning in order to service those users.

There are many standards for Closed Captioning, some of which define how to transmit the data in an MPEG stream (MPEG2 SCTE-20, ATSC A/53, H.264 SCTE-128), some of which define the encoding and decoding of the characters in the payload (EIA-608/708). The standards for ATSC A/53 closed captioning in SEI side data for H.264 share commonality with H.262/mpeg2 closed captioning in picture user data. Higher-level data-structures such as EIA-608/708 are common to all of H.262/264/265 use cases.

My primary interest is DTVCC SCTE-128 Closed Captioning in H.264, since AVC is the most widely used video codec. While SCTE-128 is a North America-centric standard, it is now commonly used worldwide in HLS & DASH protocols for the carriage of EIA-608/708. SCTE-128 DTVCC (with EIA-608) is the most common method of delivering closed captions to end-user players and devices in HLS and DASH streams.

All of the following format request request refers to freely available documentation.

Current support in fq

Here is the fq command that I use to inspect a sample file containing DTVCC closed captioning.

At present, fq is capable of identifying that the SEI side data contains a T.35 Country Code.

$ ffmpeg -loglevel warning -hide_banner -i './testsrc.with608captions.mp4' -map '0:v:0' -codec:v 'copy' -bsf:v 'h264_mp4toannexb' -f 'h264' 'pipe:1' | fq --decode avc_annexb '[ .[] | objects | select(.nal_unit_type=="sei") | select(.sei.payload_type=="user_data_registered_itu_t_t35") | d ]'

      |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d|0123456789abcd|.[121]{}: nalu (avc_nalu)
      |00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d|0123456789abcd|  sei{}: (avc_sei)
  0x00|04                                       |.             |    payload_type: "user_data_registered_itu_t_t35" (4)
  0x00|   68                                    | h            |    payload_size: 104
  0x00|      b5 00 31 47 41 39 34 03 5f ff fc 94|  ..1GA94._...|    data: raw bits
  0x00|ae fc 94 20 fc 91 40 fc 54 68 fc e9 73 fc|... ..@.Th..s.|
  *   |until 0x69.7 (104)                       |              |
  0x06|                        04 5c b5 00 31 47|        .\..1G|    rbsp_trailing_bits: raw bits
  0x07|41 39 34 03 5b ff fc 94 20 fc e9 6e fc 20|A94.[... ..n. |
  *   |until 0xc8.7 (end) (95)                  |              |
0x3c52|         06                              |   .          |  forbidden_zero_bit: false
0x3c52|         06                              |   .          |  nal_ref_idc: 0
0x3c52|         06                              |   .          |  nal_unit_type: "sei" (6) (Supplemental enhancement information)
0x3c52|            04 68 b5 00 31 47 41 39 34 03|    .h..1GA94.|  data: raw bits
0x3c60|5f ff fc 94 ae fc 94 20 fc 91 40 fc 54 68|_...... ..@.Th|
*     |until 0x3d1e.7 (201)                     |              |

I would like to use fq to be able to dig deeper into the ITU-T.35 header, specifically for the purpose of inspecting SCTE-128 DTVCC closed captions, which in media terms is a very common type of closed captioning. There are very few closed caption tools that provide raw debug analysis - most tools (such as ccextractor or caption inspector) go the extra step and perform a conversion, which is can often change the measured result. I like fq because it attempts to decode rather than interpret or convert data.

The closed caption rawdata starts as user_data_registered_itu_t_t35, which fq already knows about. Within the user_data_registered_itu_t_t35 there is rawdata....

b5 00 31 47 41 39 34 03 5f ff fc 94
ae fc 94 20 fc 91 40 fc 54 68 fc e9 73

SCTE-128 DTVCC Format (Summary)

From: https://www.scte.org/documents/373/ANSI_SCTE-128-1-2020-1586877225672.pdf

image image

itu_t_35_country_code

Within the user_data_registered_itu_t_t35, the ITU publishes a full lookup table, with country codes. Here is a link with all the current T.35 country codes. Standard is freely and publicly available and is maintained by ITU.

If a kind developer were to add an ITU T.35 Country Code Parser, I have extracted the values from the ITU documentation and formatted in what would be a useful way, in the hope of making a developer's task easier.

t35_country_codes.txt

The T.35 country code for DVTCC closed captioning is always B5 (since it is a standard derived from ATSC in the United States). A more comprehensive, generic T.35 parser in fq could include all country codes from the above pre-formatted file.

b5 00 31 47 41 39 34 03 5f ff fc 94
^^ itu_t_35_country_code
ae fc 94 20 fc 91 40 fc 54 68 fc e9 73

Where itu_t_35_country_code of value {0xb5: "United States"}

itu_t_35_provider_type

Each countries' standards body gets to define and scope their provider_type. DTVCC closed captioning is a US-standard. DTVCC 608/708 closed captions are used around the world in protocols such as HLS).

The provider_type for SCTE-128 Closed Captions is always 0x0031.

b5 00 31 47 41 39 34 03 5f ff fc 94
   ^^ ^^ itu_t_35_provider_type
ae fc 94 20 fc 91 40 fc 54 68 fc e9 73

Where itu_t_35_provider_type is {0x0031: SCTE-128 DTVCC}

scte_128_user_identifier

In SCTE-128, the following user identifiers are defined:

{
  0x47413934:          ATSC data ("GA94"),
  0x44544731:          Active Format Description (AFD), (“DTG1”)
}

From a selfish perspective, I am interested in the ATSC data, however the same SCTE-128 document does also cover Active Format Description signalling which may prove valuable to others (in case it also piques your interest or was on your to-do-sometime-in-the-future list).

b5 00 31 47 41 39 34 03 5f ff fc 94
         ^^ ^^ ^^ ^^ scte_128_user_identifier
ae fc 94 20 fc 91 40 fc 54 68 fc e9 73

Where scte_128_user_identifier is {0x31474139: "ATSC data (GA94)"}

ATSC1_data ("GA94")

Within the ATSC1_data header, there are registered codes. The one which is interesting from a Closed Caption perspective is 0x03 cc_data.

image

This information is also published and maintained by ATSC in the last tab of this freely accessible, public Code Point Registry spreadsheet...

image
b5 00 31 47 41 39 34 03 5f ff fc 94
                     ^^ atsc1_data
ae fc 94 20 fc 91 40 fc 54 68 fc e9 73

Where atsc1_data is {0x03: "ccdata"}.

ccdata() = EIA-708

b5 00 31 47 41 39 34 03 5f ff fc 94
                        ^^ ^^ ^^ ^^ ccdata() et cetera...
ae fc 94 20 fc 91 40 fc 54 68 fc e9 73

The ccdata() is defined in EIA-708, and while I would love to extend this format request to include EIA-708 headers, enabling fq to list headers that relate to the captions tracks (608 CC1/2/3/4) and (708 SERVICE1-6), I appreciate that this post is already long enough. In simple terms the ccdata() payload contains headers which list the CC1/2/3/4 tracks header and payload (aka 608 compatibility mode), followed by the headers and payload for full 708 data. The EIA-708 spec has traditionally been pay-to-play and not available for the general public, but has now been made available free-of-charge and "Available to Everyone" from the CTA (registration required).

https://shop.cta.tech/products/digital-television-dtv-closed-captioning

The EIA-708 standard defines the methods to decode ccdata() headers.

image

Please do not interpret this as a request for the decoding of closed captioning to human readable text. I consider fq to be a header inspector and thus indexing and listing the available closed captioning tracks in the SEI would be of great benefit to me and other video technicians, but fq does not attempt to be a video decoder, so therefore it need not attempt to decode the actual encoded text in the payload.

While the above list may be long, even fq was to implement the lower-level headers without ccdata(), it would be very beneficial.

I am not a developer, and while I can just about understand what https://github.com/wader/fq/blob/master/format/mpeg/avc_annexb.go and https://github.com/wader/fq/blob/master/format/mpeg/avc_sei.go are doing, I am not capable of writing a format interpreter in go, that is beyond my skillset. But I hope that you will appreciate that I have tried to present data such as t35_country_codes.txt in a data structure which would be compatible with the current parsers.

I hope you find the above easy to read. i have put a lot of care and attention into making the github markup clear and I have referenced any standards in the format request.

I do have a repository of some (copyrighted) media streams for the above from US broadcasters, which I could share privately, in the interests of legitimate research, however I have not attempted to attach these to your github repo to as to avoid you being flagged.

I am including my version of fq out of politeness since that is requested on all new tickets, although it is unlikely to be relevant for a format request.

# Version
fq -v
0.1.0 (darwin amd64)

# Installed by
brew install fq

Even if this format request is not of interest, thanks for developing such a great tool and taking the time to read this far!

wader commented 1 year ago

Hey, thanks for the very detailed and well formated issue!

I appreciate that fq started with a heavy focus on media files, and thus I hope this request would be interpreted as being on topic.

Thanks for all the praises! yes very on topic, it was and is designed and used to do things just like this :)

I like fq because it attempts to decode rather than interpret or convert data.

Yes that is main idea, try to only decorate and present without much interpretation. There is some formats that do require interpretation do be more useful, like demux or traverse samples tables etc. Not sure if it makes sense in this case but some format fq has additional fq functions to convert the detailed decode structure into something more close to the intended form. Another way it do provide various useful jq snippets in the documentation also.

I've created an initial PR at https://github.com/wader/fq/pull/534 so it's easier to discuss and collaborate. Lots in the PR is ugly and probably used the wrong naming and terminology, will need your help to sort that out.

I think we can start of by focusing on the things you want for now and then see how it goes. It would be great to get some dump of the SEI data, if the data is no senitive i think something like should write the payload to a file:

fq 'first(grep_by(.payload_type=="user_data_registered_itu_t_t35")).data | tobytes' file.mp4 > itu_t_t35_dump

alternatively or additionally do you know if some tools like ffmpeg can produce mp4 with this kind of metadata? it's very valuable to have a somewhat diverse set of blob of a format to do a good decoder.

What platform are you on and are you able to build your own version of fq? you should be able to build the PR branch by doing something like:

GOPROXY=direct go install github.com/wader/fq@sei-itu-t35
# GOPROXY=direct is to skip golang central caching system
# copy binary to $PATH if needed
cp "$(go env GOPATH)/bin/fq" /usr/local/bin

with that and some luck you should be able to decode some of it.

This is how are i get with your dump from above, looks sane? i get it fails because it's truncated

➜  fq git:(sei-itu-t35) ✗ go run . -n '"b5003147413934035ffffc94aefc9420fc9140fc5468fce973" | fromhex | mpeg_itu_t35 | d({line_bytes:8})'
    │00 01 02 03 04 05 06 07│01234567│.{}: (mpeg_itu_t35)
    │                       │        │  error: mpeg_itu_t35: FieldFormat: TryFieldFormat: failed at position 8 (read size 0 seek pos 0): U1(one_bit): failed at position 17 (read size 0 seek pos 0): EOF
0x00│b5                     │.       │  country_code: "United States" (181)
0x00│   00 31               │ .1     │  provider_code: 49
0x00│         47 41 39 34   │   GA94 │  user_identifier: "GA94"
    │                       │        │  user_structure{}:
0x00│                     03│       .│    user_data_type_code: "CEA-708 captions" (3)
0x08│5f ff fc 94 ae fc 94 20│_...... │  gap0: raw bits
0x10│fc 91 40 fc 54 68 fc e9│..@.Th..│
0x18│73│                    │s│

➜  fq git:(sei-itu-t35) ✗ go run . -n '"b5003147413934035ffffc94aefc9420fc9140fc5468fce973" | fromhex | mpeg_itu_t35.gap0 | mpeg_cc_data | d({line_bytes:8})'
    │00 01 02 03 04 05 06 07│01234567│.{}: (mpeg_cc_data)
    │                       │        │  error: mpeg_cc_data: U1(one_bit): failed at position 17 (read size 0 seek pos 0): EOF
0x08│5f                     │_       │  reserved0: 0
0x08│5f                     │_       │  process_cc_data_flag: true
0x08│5f                     │_       │  zero_bit: 0
0x08│5f                     │_       │  cc_count: 31
0x08│   ff                  │ .      │  reserved1: 255
    │                       │        │  cc[0:6]:
    │                       │        │    [0]{}: cc
0x08│      fc               │  .     │      one_bit: 1
0x08│      fc               │  .     │      reserved0: 15
0x08│      fc               │  .     │      cc_valid: true
0x08│      fc               │  .     │      cc_type: 0
0x08│         94            │   .    │      cc_data_1: 148
0x08│            ae         │    .   │      cc_data_2: 174
    │                       │        │    [1]{}: cc
0x08│               fc      │     .  │      one_bit: 1
0x08│               fc      │     .  │      reserved0: 15
0x08│               fc      │     .  │      cc_valid: true
0x08│               fc      │     .  │      cc_type: 0
0x08│                  94   │      . │      cc_data_1: 148
0x08│                     20│        │      cc_data_2: 32
    │                       │        │    [2]{}: cc
0x10│fc                     │.       │      one_bit: 1
0x10│fc                     │.       │      reserved0: 15
0x10│fc                     │.       │      cc_valid: true
0x10│fc                     │.       │      cc_type: 0
0x10│   91                  │ .      │      cc_data_1: 145
0x10│      40               │  @     │      cc_data_2: 64
    │                       │        │    [3]{}: cc
0x10│         fc            │   .    │      one_bit: 1
0x10│         fc            │   .    │      reserved0: 15
0x10│         fc            │   .    │      cc_valid: true
0x10│         fc            │   .    │      cc_type: 0
0x10│            54         │    T   │      cc_data_1: 84
0x10│               68      │     h  │      cc_data_2: 104
    │                       │        │    [4]{}: cc
0x10│                  fc   │      . │      one_bit: 1
0x10│                  fc   │      . │      reserved0: 15
0x10│                  fc   │      . │      cc_valid: true
0x10│                  fc   │      . │      cc_type: 0
0x10│                     e9│       .│      cc_data_1: 233
0x18│73│                    │s│      │      cc_data_2: 115
    │                       │        │    [5]{}: cc
bbgdzxng1 commented 1 year ago

Closing, since it is covered under separate branch & PR.