Closed rboy1 closed 5 years ago
For MPEG4, the documentation I found is in: https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/Metadata/Metadata.html
There is an ESDC part there: https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap3/qtff3.html#//apple_ref/doc/uid/TP40000939-CH205-124774
Indeed, it is better if the parsing is never throwing an exception. Instead, in case of unexpected format found during parsing, the File.MarkAsCorrupt(string reason)
method should be used. This enables to continue reading the remaining metadata and return the data that were successfully retrieved, but still prevents from writing metadata into the file and then avoid further corruptions of the file.
Thanks, yes I did find that also. The section of the ESDS in the link above refers to the ESDS that follows a MP4V atom and the code seems to work fine for that atom.
Later in the documentation there is another section that refers to ESDS that follows a MP4A atom. That’s where the current code is having an issue with the parsing.
What I was unable to to find in the documentation were references to the protocol byte format and certain hard coded references in code. For example the first byte needs to be 3, the minimum length for the configuration descriptor needs to be 15 (this seems to work for the video track but for audio it returning 13) etc.
Where did you find/reference the expected data format structures ?
That's all I know about the MP4... I can't find anything else relevant about the format. This code you pointing at has remained unchanged since 2007. Maybe @bnickel can give us a tip on how he managed to write this ESDS code ? Maybe he has still some lost documentation about the MPEG4 format ? ;) Otherwise I think reverse-engineering is the only way out...
Okay some good news here, I was able to find the specifications for the ES_Descriptor at http://ecee.colorado.edu/~ecen5653/ecen5653/papers/ISO%2014496-1%202004.PDF Section 7.2.6.5
I appears that the code is expecting DecoderSpecificInfo
after the DecoderConfigDescriptor
which is optional as per the specs and also in this sample it contains a SLConfigDescriptor
which is not being handled, hence it's marking it as corrupted. So some refactoring required which I've done. I've added support for a few more descriptors.
The code also makes some assumptions about lengths for the DecoderConfigDescriptor
as a minimum of 15 but that appears to be incorrect, it should be 13 bytes if there are no DecoderSpecificInfo
in the descriptor. Maybe someone can look at the specifications and confirm my understanding. (note that the length indicated in the data structure applies to the remainder of the data starting after the length attribute and not including the length attribute itself).
This assessment looks spot on. The spec shows 13 required bytes then an optional 14th decSpecificInfo byte. If that's present, then there (maybe?) has to be at least one more byte for length.
It makes sense to drop that value down to 13, then wrap the DecoderConfig
bit of things in a if (descriptorLength >= 15) { ... }
It probably also makes sense to drop the following down to 18. https://github.com/mono/taglib-sharp/blob/305aa88c66f8033e3bfd7e1c39c6552c0726c494/src/TaglibSharp/Mpeg4/Boxes/AppleElementaryStreamDescriptor.cs#L93
This atom had:
03 # esdstag
80 80 80 1b # length = 27
00 02 # Stream Id = 2
00 # Priority = 0
04 # tag = 4
80 80 80 0d # length = 13 **BANG**
6b # object ID
15 # Stream Type
00 00 00 # Buffer size
00 03 e8 00 # Max Bitrate
00 03 e8 00 # Average Bitrate
06 # tag = 6
80 80 80 01 # length = 1
02 # 2 per https://www.sis.se/api/document/preview/80008051/
But it could easily have been:
03 # esdstag
12 # length = 18 **BANG**
00 02 # Stream Id = 2
00 # Priority = 0
04 # tag = 4
0d # length = 13
6b # object ID
15 # Stream Type
00 00 00 # Buffer size
00 03 e8 00 # Max Bitrate
00 03 e8 00 # Average Bitrate
Honestly, I don't know why I even surface DecoderConfig
. That probably points to another codebase I based this off of.
Neat info graphic, where did you get that from?
It's on page 115 of http://ecee.colorado.edu/~ecen5653/ecen5653/papers/ISO%2014496-1%202004.PDF
It's the same information as 7.2.6.6.1 but for whatever reason I ran across it first.
class DecoderConfigDescriptor extends BaseDescriptor : bit(8) tag=DecoderConfigDescrTag {
bit(8) objectTypeIndication;
bit(6) streamType;
bit(1) upStream;
const bit(1) reserved=1;
bit(24) bufferSizeDB;
bit(32) maxBitrate;
bit(32) avgBitrate;
DecoderSpecificInfo decSpecificInfo[0 .. 1];
profileLevelIndicationIndexDescriptor profileLevelIndicationIndexDescr [0..255];
}
fwiw, here's FFMPEG's code that looks at https://github.com/FFmpeg/FFmpeg/blob/a1f0dd24f62532ff82ac87fbcb01244e6cdfa424/libavformat/isom.c#L535 what we're storing in DecoderConfig
. Looks like they can extract a few details from it if the audio is AAC.
Thanks. I’ve refactored the code and will post it after I’ve finished testing it
@bnickel
If my interpretation is correct the minimum size of the ES_Descriptor should be 21 bytes
[Base (3 bytes) + DecoderConfigDescriptor (15 bytes) + SLConfigDescriptor (3 bytes) + OtherDescriptors]
Here is the refactored file to parse the ES_Descriptor. This should be able to parse any number of embedded/optional descriptors (it may not process the information but it'll ignore them and not crash)
Are we good to close this now that #147 is merged?
There appears to be bug in the code when it tries to parse a ESDS atom from a MPEG4 stream. I'm attaching the sample file which can be used to replicate the issue.
The ATOM structure of the file is:
When TagLib tries to parse the second
esds
atom following themp4a
atom is throws a corrupted exception.Specifically the issues lies in the
AppleElementaryStreamDescriptor.cs
method:At line
if (ReadLength (box_data, ref offset) < 15)
it expects 15 bytes but this file appears to have 13 bytes.Now I can't find the original specifications for the ESDS atom so I'm unable to figure out what's the correct number of bytes expected.
@Starwer @decriptor do you have any idea where the specifications could be found?
If no one is able to find the original specs maybe we should consider skipping the ESDS atom for now since it basically kills parsing the rest of the file and metadata (which is intact). This would be in
BoxFactory.cs
to comment out this code:Thoughts? ESDS.zip