mono / taglib-sharp

Library for reading and writing metadata in media files
GNU Lesser General Public License v2.1
1.29k stars 312 forks source link

Error parsing Apple Elementary Stream Descriptor (ESDS) atoms #146

Closed rboy1 closed 5 years ago

rboy1 commented 5 years ago

There appears to be bug in the code when it tries to parse a ESDS atom from a MPEG4 stream. I'm attaching the sample file which can be used to replicate the issue.

The ATOM structure of the file is:

Atom ftyp @ 0 of size: 28, ends @ 28
Atom free @ 28 of size: 8, ends @ 36
Atom mdat @ 36 of size: 5003556, ends @ 5003592
Atom moov @ 5003592 of size: 6175, ends @ 5009767
     Atom mvhd @ 5003600 of size: 108, ends @ 5003708
     Atom trak @ 5003708 of size: 3232, ends @ 5006940
         Atom tkhd @ 5003716 of size: 92, ends @ 5003808
         Atom edts @ 5003808 of size: 48, ends @ 5003856
             Atom elst @ 5003816 of size: 40, ends @ 5003856
         Atom mdia @ 5003856 of size: 3084, ends @ 5006940
             Atom mdhd @ 5003864 of size: 32, ends @ 5003896
             Atom hdlr @ 5003896 of size: 45, ends @ 5003941
             Atom minf @ 5003941 of size: 2999, ends @ 5006940
                 Atom vmhd @ 5003949 of size: 20, ends @ 5003969
                 Atom dinf @ 5003969 of size: 36, ends @ 5004005
                     Atom dref @ 5003977 of size: 28, ends @ 5004005
                         Atom url  @ 5003993 of size: 12, ends @ 5004005
                 Atom stbl @ 5004005 of size: 2935, ends @ 5006940
                     Atom stsd @ 5004013 of size: 263, ends @ 5004276
                         Atom mp4v @ 5004029 of size: 247, ends @ 5004276
                             Atom esds @ 5004115 of size: 135, ends @ 5004250
                             Atom fiel @ 5004250 of size: 10, ends @ 5004260             ~
                             Atom pasp @ 5004260 of size: 16, ends @ 5004276             ~
                     Atom stts @ 5004276 of size: 24, ends @ 5004300
                     Atom stss @ 5004300 of size: 48, ends @ 5004348
                     Atom ctts @ 5004348 of size: 1024, ends @ 5005372
                     Atom stsc @ 5005372 of size: 28, ends @ 5005400
                     Atom stsz @ 5005400 of size: 772, ends @ 5006172
                     Atom stco @ 5006172 of size: 768, ends @ 5006940
     Atom trak @ 5006940 of size: 2729, ends @ 5009669
         Atom tkhd @ 5006948 of size: 92, ends @ 5007040
         Atom edts @ 5007040 of size: 36, ends @ 5007076
             Atom elst @ 5007048 of size: 28, ends @ 5007076
         Atom mdia @ 5007076 of size: 2593, ends @ 5009669
             Atom mdhd @ 5007084 of size: 32, ends @ 5007116
             Atom hdlr @ 5007116 of size: 45, ends @ 5007161
             Atom minf @ 5007161 of size: 2508, ends @ 5009669
                 Atom smhd @ 5007169 of size: 16, ends @ 5007185
                 Atom dinf @ 5007185 of size: 36, ends @ 5007221
                     Atom dref @ 5007193 of size: 28, ends @ 5007221
                         Atom url  @ 5007209 of size: 12, ends @ 5007221
                 Atom stbl @ 5007221 of size: 2448, ends @ 5009669
                     Atom stsd @ 5007229 of size: 96, ends @ 5007325
                         Atom mp4a @ 5007245 of size: 80, ends @ 5007325
                             Atom esds @ 5007281 of size: 44, ends @ 5007325
                     Atom stts @ 5007325 of size: 24, ends @ 5007349
                     Atom stsc @ 5007349 of size: 1528, ends @ 5008877
                     Atom stsz @ 5008877 of size: 20, ends @ 5008897
                     Atom stco @ 5008897 of size: 772, ends @ 5009669
     Atom udta @ 5009669 of size: 98, ends @ 5009767
         Atom meta @ 5009677 of size: 90, ends @ 5009767
             Atom hdlr @ 5009689 of size: 33, ends @ 5009722
             Atom ilst @ 5009722 of size: 45, ends @ 5009767
                 Atom ©too @ 5009730 of size: 37, ends @ 5009767
                     Atom data @ 5009738 of size: 29, ends @ 5009767

 ~ denotes an unknown atom
------------------------------------------------------
Total size: 5009767 bytes; 54 atoms total.
Media data: 5003556 bytes; 6211 bytes all other atoms (0.124% atom overhead).
Total free atom space: 8 bytes; 0.000% waste.
------------------------------------------------------
AtomicParsley version: 0.9.6-hg109.9183fff907bf (utf16)
------------------------------------------------------

When TagLib tries to parse the second esds atom following the mp4a atom is throws a corrupted exception.

Specifically the issues lies in the AppleElementaryStreamDescriptor.cs method:

public AppleElementaryStreamDescriptor (BoxHeader header,
                                                TagLib.File file,
                                                IsoHandlerBox handler)
            : base (header, file, handler

At line if (ReadLength (box_data, ref offset) < 15) it expects 15 bytes but this file appears to have 13 bytes.

Now I can't find the original specifications for the ESDS atom so I'm unable to figure out what's the correct number of bytes expected.

@Starwer @decriptor do you have any idea where the specifications could be found?

If no one is able to find the original specs maybe we should consider skipping the ESDS atom for now since it basically kills parsing the rest of the file and metadata (which is intact). This would be in BoxFactory.cs to comment out this code:

else if (type == BoxType.Esds)
                return new AppleElementaryStreamDescriptor (
                    header, file, handler);

Thoughts? ESDS.zip

Starwer commented 5 years ago

For MPEG4, the documentation I found is in: https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/Metadata/Metadata.html

There is an ESDC part there: https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap3/qtff3.html#//apple_ref/doc/uid/TP40000939-CH205-124774

Indeed, it is better if the parsing is never throwing an exception. Instead, in case of unexpected format found during parsing, the File.MarkAsCorrupt(string reason) method should be used. This enables to continue reading the remaining metadata and return the data that were successfully retrieved, but still prevents from writing metadata into the file and then avoid further corruptions of the file.

rboy1 commented 5 years ago

Thanks, yes I did find that also. The section of the ESDS in the link above refers to the ESDS that follows a MP4V atom and the code seems to work fine for that atom.

Later in the documentation there is another section that refers to ESDS that follows a MP4A atom. That’s where the current code is having an issue with the parsing.

What I was unable to to find in the documentation were references to the protocol byte format and certain hard coded references in code. For example the first byte needs to be 3, the minimum length for the configuration descriptor needs to be 15 (this seems to work for the video track but for audio it returning 13) etc.

Where did you find/reference the expected data format structures ?

Starwer commented 5 years ago

That's all I know about the MP4... I can't find anything else relevant about the format. This code you pointing at has remained unchanged since 2007. Maybe @bnickel can give us a tip on how he managed to write this ESDS code ? Maybe he has still some lost documentation about the MPEG4 format ? ;) Otherwise I think reverse-engineering is the only way out...

rboy1 commented 5 years ago

Okay some good news here, I was able to find the specifications for the ES_Descriptor at http://ecee.colorado.edu/~ecen5653/ecen5653/papers/ISO%2014496-1%202004.PDF Section 7.2.6.5

I appears that the code is expecting DecoderSpecificInfo after the DecoderConfigDescriptor which is optional as per the specs and also in this sample it contains a SLConfigDescriptor which is not being handled, hence it's marking it as corrupted. So some refactoring required which I've done. I've added support for a few more descriptors.

The code also makes some assumptions about lengths for the DecoderConfigDescriptor as a minimum of 15 but that appears to be incorrect, it should be 13 bytes if there are no DecoderSpecificInfo in the descriptor. Maybe someone can look at the specifications and confirm my understanding. (note that the length indicated in the data structure applies to the remainder of the data starting after the length attribute and not including the length attribute itself).

bnickel commented 5 years ago

This assessment looks spot on. The spec shows 13 required bytes then an optional 14th decSpecificInfo byte. If that's present, then there (maybe?) has to be at least one more byte for length.

screen shot 2019-02-17 at 6 36 15 pm

It makes sense to drop that value down to 13, then wrap the DecoderConfig bit of things in a if (descriptorLength >= 15) { ... }

It probably also makes sense to drop the following down to 18. https://github.com/mono/taglib-sharp/blob/305aa88c66f8033e3bfd7e1c39c6552c0726c494/src/TaglibSharp/Mpeg4/Boxes/AppleElementaryStreamDescriptor.cs#L93

This atom had:

03          # esdstag
80 80 80 1b # length = 27
00 02       # Stream Id = 2
00          # Priority = 0
04          # tag = 4
80 80 80 0d # length = 13 **BANG**
6b          # object ID
15          # Stream Type
00 00 00    # Buffer size
00 03 e8 00 # Max Bitrate
00 03 e8 00 # Average Bitrate
06          # tag = 6
80 80 80 01 # length = 1
02          # 2 per https://www.sis.se/api/document/preview/80008051/

But it could easily have been:

03          # esdstag
12          # length = 18 **BANG**
00 02       # Stream Id = 2
00          # Priority = 0
04          # tag = 4
0d          # length = 13
6b          # object ID
15          # Stream Type
00 00 00    # Buffer size
00 03 e8 00 # Max Bitrate
00 03 e8 00 # Average Bitrate

Honestly, I don't know why I even surface DecoderConfig. That probably points to another codebase I based this off of.

rboy1 commented 5 years ago

Neat info graphic, where did you get that from?

bnickel commented 5 years ago

It's on page 115 of http://ecee.colorado.edu/~ecen5653/ecen5653/papers/ISO%2014496-1%202004.PDF

It's the same information as 7.2.6.6.1 but for whatever reason I ran across it first.


class DecoderConfigDescriptor extends BaseDescriptor : bit(8) tag=DecoderConfigDescrTag {
   bit(8) objectTypeIndication;
   bit(6) streamType;
   bit(1) upStream;
   const bit(1) reserved=1;
   bit(24) bufferSizeDB;
   bit(32) maxBitrate;
   bit(32) avgBitrate;
   DecoderSpecificInfo decSpecificInfo[0 .. 1];
   profileLevelIndicationIndexDescriptor profileLevelIndicationIndexDescr [0..255];
}
bnickel commented 5 years ago

fwiw, here's FFMPEG's code that looks at https://github.com/FFmpeg/FFmpeg/blob/a1f0dd24f62532ff82ac87fbcb01244e6cdfa424/libavformat/isom.c#L535 what we're storing in DecoderConfig. Looks like they can extract a few details from it if the audio is AAC.

rboy1 commented 5 years ago

Thanks. I’ve refactored the code and will post it after I’ve finished testing it

rboy1 commented 5 years ago

@bnickel

If my interpretation is correct the minimum size of the ES_Descriptor should be 21 bytes [Base (3 bytes) + DecoderConfigDescriptor (15 bytes) + SLConfigDescriptor (3 bytes) + OtherDescriptors]

Here is the refactored file to parse the ES_Descriptor. This should be able to parse any number of embedded/optional descriptors (it may not process the information but it'll ignore them and not crash)

147

decriptor commented 5 years ago

Are we good to close this now that #147 is merged?