olegloa / mp4v2

Automatically exported from code.google.com/p/mp4v2
Other
0 stars 0 forks source link

mp4v2 breaks "French" designation on mp4 streams #57

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create an mp4 file with stream specified to be in French
2. Use any mp4v2 tool (such mp4file --optimize or mp4tags) on the mp4 file

What is the expected output? What do you see instead?
The language specification is supposed to remain unchanged.  However, it 
changes to "undetermined."

This happens reproducibly to all French streams.  Streams in other 
languages like English or Spanish are unaffected.  

Concrete example:
% mp4box -info a.mp4 >before.txt
% mp4file --optimize a.mp4
% mp4box -info a.mp4 >after.txt
% diff before.txt after.txt
<Track # 4 Info - TrackID 4 - TimeScale 90000 - Duration 00:51:36.086
<Media Info: Language "French" - Type "subp:mp4s" - 759 samples
<MPEG-4 Config: NeroDigital Subpicture Stream - ObjectTypeIndication 0xe0
<Synchronized on stream 1

---
>Track # 4 Info - TrackID 4 - TimeScale 90000 - Duration 00:51:36.086
>Media Info: Language "Undetermined" - Type "subp:mp4s" - 759 samples
>MPEG-4 Config: NeroDigital Subpicture Stream - ObjectTypeIndication 0xe0
>Synchronized on stream 1

What version of the product are you using? On what operating system?
Latest version pulled from svn, compiled and run on Windows 7 x64.

Original issue reported on code.google.com by CarlEd...@gmail.com on 6 Mar 2010 at 3:45

GoogleCodeExporter commented 9 years ago
could you provide a file with a stream specified as being french?  I'm not sure 
how
to create one, but I wouldn't mind seeing if I could dig up the bug.

Original comment by kid...@gmail.com on 3 Apr 2010 at 12:33

GoogleCodeExporter commented 9 years ago
Sorry for the delay, but it took me a while until I ran across this bug again 
and I 
had a chance to generate a small example file.  Have a look at the attached 
file:

It was freshly generated with mp4box and has the following info:
% mp4box -info a.mp4 >before.txt
% mp4file --optimize a.mp4 [Just an example--all the mp4v2 tools which rewrite 
the 
file seem to have the same effect]
% mp4box -info a.mp4 >after.txt

Note the difference in track 2 & 3 which were German (yes, German is affected 
just 
like French, but English, Japanese and Chinese aren't) audio tracks before and 
have 
undetermined language after and track 8 which was a French subtitle track 
before and 
also has undetermined language afterwards.

Original comment by CarlEd...@gmail.com on 31 May 2010 at 9:15

Attachments:

GoogleCodeExporter commented 9 years ago
Sorry, the system is giving me problem with the relatively small uploads (even 
the mp4 
is only 2 MBytes), so I'll do them one by one.

Original comment by CarlEd...@gmail.com on 31 May 2010 at 9:16

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by CarlEd...@gmail.com on 31 May 2010 at 9:17

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks Carl--I'll look into it today.

Original comment by kid...@gmail.com on 1 Jun 2010 at 4:49

GoogleCodeExporter commented 9 years ago
Thank *you*, kidjan!  Any help is much appreciated.

By the way and for what it is worth, I just downloaded and compiled (under 
Windows 7 
x64) the latest version (v386) and the problem is still fully reproducible.  

Also, Spanish is one of the other languages *not* affected.

As the problem seems to be specific to the language used, is it possible that 
it is 
caused by a different set of language codes used by mp4box and mp4v2?  Mp4box 
will 
dump its set of language codes when invoked with option "-languages"

Original comment by CarlEd...@gmail.com on 3 Jun 2010 at 12:05

GoogleCodeExporter commented 9 years ago
Yes, I think it is possible that it's some difference between mp4box and mp4v2,
although I'm still trying to isolate the bug.  I can definitely reproduce it 
(I'm
running Win7) quite easily.

Original comment by kid...@gmail.com on 3 Jun 2010 at 4:26

GoogleCodeExporter commented 9 years ago
I went through this a bit more carefully.  In this function, mp4v2 parsed out 
the
string "ger" as the language code:

void
MP4LanguageCodeProperty::Read( MP4File* file, uint32_t index )
{
    uint16_t data = file->ReadBits( 16 );

    char code[3];
    code[0] = ((data & 0x7c00) >> 10) + 0x60;
    code[1] = ((data & 0x03e0) >>  5) + 0x60;
    code[2] = ((data & 0x001f)      ) + 0x60;

    SetValue( bmff::enumLanguageCode.toType( string( code, sizeof(code) )));
}

However, mp4v2 doesn't seem to contain all of the language codes per ISO 639-2 
(see
http://www.loc.gov/standards/iso639-2/php/code_list.php); German can be either 
"deu"
or "ger", but mp4v2 only contains "deu", which causes it to list the language 
as some
undefined type.  So I think maybe the problem here is the
array of language code/values in typebmff.cpp that lacks some of the duplicate 
values.

Let me do a bit more research, and then I'll probably have a fix available.

Original comment by kid...@gmail.com on 3 Jun 2010 at 5:05

GoogleCodeExporter commented 9 years ago
Hmm, I take that back.  After a bit more investigation, I found this changeset:

http://code.google.com/p/mp4v2/source/detail?r=247&path=/trunk/src/bmff/type.cpp

...per the MP4 specification (see
http://developer.apple.com/mac/library/documentation/QuickTime/QTFF/qtff.pdf), 
MP4
files should be using ISO 639-2/T, not ISO 639-2/B, so the language code/values 
in
MP4v2 are in fact correct.  

It seems that mp4box must be using ISO 639-2/B?  I believe you can dump the 
language
codes with "-language" to see, but if German corresponds with "ger", then 
mp4box is
using the wrong codes (i.e. ISO 639-2/B).

Original comment by kid...@gmail.com on 3 Jun 2010 at 5:24

GoogleCodeExporter commented 9 years ago
(or perhaps it isn't mp4box, but whatever tool was used to create the mp4 file)

Original comment by kid...@gmail.com on 3 Jun 2010 at 5:30

GoogleCodeExporter commented 9 years ago
Fair enough.  But "interpret loosely, generate strictly" is still a good 
software 
design adage.  If the specification says ISO 639-2/T, by all means that is what 
mp4v2 
should generate.

But where would be the harm in also accepting ISO 639-2/B and either preserving 
it or 
replace it by the proper codes (rather than replacing it by undetermined as 
currently, which in turn is interpreted as "English" by VLC)?  Given the 
ubiquity of 
mp4box as tool (either directly or under some GUI) for creating mp4 files, a 
great 
deal of good would be in that.

Original comment by CarlEd...@gmail.com on 3 Jun 2010 at 5:33

GoogleCodeExporter commented 9 years ago
The example I posted earlier was indeed created with mp4box.

Please find attached a list of the mp4box languages and abbreviations.

Original comment by CarlEd...@gmail.com on 4 Jun 2010 at 12:31

Attachments:

GoogleCodeExporter commented 9 years ago
When you use mp4box, is there some way you could use this flag to set the 
languages
appropriate?

-lang [tkID=]LAN     sets track language. LAN is the ISO 639-2 code (eng, und)

...for german, you'd use "deu".

Could you try this out?

Original comment by kid...@gmail.com on 4 Jun 2010 at 12:53

GoogleCodeExporter commented 9 years ago
Thanks, kidjan, that works!  Mp4box clearly accepts both kinds of codes.  

Still, as there are some software packages (including mp4box itself, when 
generating 
the language description for subtitles from idx files) which use the wrong 
codes, 
perhaps a patch that would allow mp4v2 to recognize the /B codes (perhaps with 
a 
warning), while writing the correct /T codes, would be useful?

Original comment by CarlEd...@gmail.com on 4 Jun 2010 at 2:32

GoogleCodeExporter commented 9 years ago
I'll think about it.  I did discuss it with another developer (kona), and here 
are
his comments:

"Since 639-2/T codes are required, I don't see the benefit of supporting mp4box 
bugs.
What happens is someone always asks for the next step for mp4box
bug-compatibility...Let's assume conversion from 639-2/B was supported, the next
thing someone is going to ask is for it to be preserved. Then flags need to be 
added
to the CLI tools. And so on."

For example, if I add code to recognize the /B codes and correct them to /T 
codes,
then at some point it wouldn't be totally surprising for someone to ask for a 
way to
not have the library "correct" it to deal with some other non-standard 
extension, in
which case I'm in some hot water since I probably can't just back out the 
previous
change that people are now using in production code.

If I did anything, I'd probably add a warning for any recognized /B code to 
alert
people that they're using non-standard language extensions, but take zero actual
action.  The other thing I'd do is follow up with mp4box and ask them why they 
aren't
standards compliant (chances are, it's for the aforementioned reasons: someone
decided to accommodate, and now they're in a bit of an awkward spot).

Anyway, glad you have a fix.  I'm closing this defect.

Original comment by kid...@gmail.com on 4 Jun 2010 at 3:47

GoogleCodeExporter commented 9 years ago
I can understand the concern about not propagating bugs into more software than 
absolutely necessary.  But I don't think that would be an issue here.

Mp4box is just fine interpreting the *correct* codes; it just sometimes 
generates the 
incorrect codes.  There would be no advantage or call for mp4v2 ever generating 
the 
incorrect codes.

The only question is what mp4v2 should do when it encounters a common type of 
slightly-out-spec file, that is a file with an 639-2/B code generated by 
mp4box?  Is 
it better to change that incorrect code to "undetermined" as the current mp4v2 
does?  
Or is it better to change it to the correct code when that can be done easily 
and 
accurately?

This seems to me an easy call.  "Interpret loosely, generate strictly."

Original comment by CarlEd...@gmail.com on 4 Jun 2010 at 11:02

GoogleCodeExporter commented 9 years ago
One hopefully last comment/correction:

Earlier I said that mp4box will happily accepts the correct /T codes with the 
"-lang" 
option.  That is true as far as it goes.  *However*, mp4box will rewrite the 
correct 
/T codes given on the command line into the incorrect /B codes in the mp4 file! 
 I 
only just learned that while trying to manually fix a file with broken 
designations--
running mp4file --optimize on the result changes them all back to 
"Undetermined".

The bottom line is that I have (and know of) no way to create an mp4file with 
correct 
language designations.  Mp4box won't do it, even if spoon-fed the correct ones. 

Mp4v2 does not have any utility to set or edit these options.

I realize that these problems are not the fault of mp4v2's authors and 
maintainers.  
I've sent a number of extremely specific bug reports to whatever addresses 
appeared 
to be associated with the keepers of the mp4box code, but never gotten so much 
as a 
single reply.  So my hopes of them fixing the problem seem low.

That may make my earlier point that mp4v2 should accept the incorrect /B codes 
just 
write the correct /T codes all the more urgent.  

Original comment by CarlEd...@gmail.com on 5 Jun 2010 at 12:28

GoogleCodeExporter commented 9 years ago
Carl,

I'll think about it.  I would really like to get in touch with mp4box's 
maintainers to see if they can resolve it; I'm not inclined to change the 
behavior of mp4v2 solely on the basis of some other programs errant, invalid 
behavior (let's be frank: this is basically working around mp4box's incorrect 
language implementation) without knowing that other program is at least moving 
in the right direction.

I'll see if I can get a hold of them.

Thanks,

kidjan

Original comment by kid...@gmail.com on 10 Jun 2010 at 6:15

GoogleCodeExporter commented 9 years ago
CarlEdman wrote:
| The bottom line is that I have (and know of) no way to create an mp4file with 
correct 
| language designations.  Mp4box won't do it, even if spoon-fed the correct 
ones.  
| Mp4v2 does not have any utility to set or edit these options.

Carl, have you tried using mp4track command-line tool from trunk? It lets one 
set some generic per-track attributes. Here's an example.

***1. list all tracks. take note do you want to use ID or INDEX as your 
mechanism to specify an individual track. we'll use INDEX to keep things simple 
(the number between brackets []). in my sample.m4v here is track[1] listing:

mp4track --list sample.m4v

[...other tracks snipped...]
track[1] id=2
  type           = audio
  enabled        = true
  inMovie        = true
  inPreview      = false
  layer          = 0
  alternateGroup = 0
  volume         = 1.0000
  width          = 0.00000000
  height         = 0.00000000
  language       = English
  handlerName    = 
  userDataName   = Stereo

***2. let's change it to french and list; make sure to always list afterwards. 
as a bogus language code will simply result in a "0" (undefined) value being 
used without error. Wikipedia lists valid 639-2 codes (both /B and /T) here: 
http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes

mp4track --track-index=1 --language 'fra' sample.m4v
mp4track --list sample.m4v

track[1] id=2
  type           = audio
  enabled        = true
  inMovie        = true
  inPreview      = false
  layer          = 0
  alternateGroup = 0
  volume         = 1.0000
  width          = 0.00000000
  height         = 0.00000000
  language       = French
  handlerName    = 
  userDataName   = Stereo

Original comment by Kona8l...@gmail.com on 10 Jun 2010 at 10:42

GoogleCodeExporter commented 9 years ago
Above example might be clearer if the same cli named option syntax was used. 
Following are equivalent, 3rd one is clearest imo.

mp4track --track-index=1 --language 'fra' sample.m4v
mp4track --track-index=1 --language='fra' sample.m4v
mp4track --track-index=1 --language=fra sample.m4v

Original comment by Kona8l...@gmail.com on 10 Jun 2010 at 10:44

GoogleCodeExporter commented 9 years ago
Thanks again, Kona8lend!  I didn't realize that the mp4track tool could set 
languages.

By the way, it works just fine and both the mp4v2 tools and mp4box can *read* 
the correct language codes.

Original comment by CarlEd...@gmail.com on 11 Jun 2010 at 12:44

GoogleCodeExporter commented 9 years ago
For what it is worth, I've sent a patch to anybody I could find with 
responsibility for GPAC/mp4box which fixes the problem, but no response so far:

The current version of GPAC I just pulled from 
https://github.com/golgol7777/gpac contains a subtle, but to the affected users 
highly annoying bug.

To be specific: VobSub languages are coded using the ISO-639-2(B) codes, rather 
than the ISO-639-2(T) codes as the standard specifies.  (See 
http://www.loc.gov/standards/iso639-2/php/code_list.php for the distinction).  
For most languages the two codes are fortunately the same, but for several 
widely used languages (including French, German, and Chinese) they are not.  
This causes VobSub tracks coded with the incorrect (B) codes to be mis- or 
un-identified in many standards-compliant MP4 tools (including VLC and mp4v2).

The problem can readily be fixed by replacing the 'lang_type lang_table' table 
in VobSub.c with the corrected version attached below.  All the changed lines 
are marked with comments.

static lang_type lang_table[] =
{
    {"--", "und" },
    {"aa", "aar" },
    {"ab", "abk" },
    {"af", "afr" },
    {"am", "amh" },
    {"ar", "ara" },
    {"as", "ast" },
    {"ay", "aym" },
    {"az", "aze" },
    {"ba", "bak" },
    {"be", "bel" },
    {"bg", "bul" },
    {"bh", "bih" },
    {"bi", "bis" },
    {"bn", "ben" },
    {"bo", "bod" }, // was "tib" (Tibetan)
    {"br", "bre" },
    {"ca", "cat" },
    {"cc", "und" },
    {"co", "cos" },
    {"cs", "ces" }, // was "cze" (Czech)
    {"cy", "cym" }, // was "wel" (Welsh)
    {"da", "dan" },
    {"de", "deu" }, // was "ger" (German)
    {"dz", "dzo" },
    {"el", "ell" }, // was "gre" (Greek, Modern (1453-))
    {"en", "eng" },
    {"eo", "epo" },
    {"es", "spa" },
    {"et", "est" },
    {"eu", "eus" }, // was "baq" (Basque)
    {"fa", "fas" }, // was "per" (Persian)
    {"fi", "fin" },
    {"fj", "fij" },
    {"fo", "fao" },
    {"fr", "fra" }, // was "fre" (French)
    {"fy", "fry" },
    {"ga", "gle" },
    {"gl", "glg" },
    {"gn", "grn" },
    {"gu", "guj" },
    {"ha", "hau" },
    {"he", "heb" },
    {"hi", "hin" },
    {"hr", "scr" },
    {"hu", "hun" },
    {"hy", "hye" }, // was "arm" (Armenian)
    {"ia", "ina" },
    {"id", "ind" },
    {"ik", "ipk" },
    {"is", "isl" }, // was "ice" (Icelandic)
    {"it", "ita" },
    {"iu", "iku" },
    {"ja", "jpn" },
    {"jv", "jav" },
    {"ka", "kat" }, // was "geo" (Georgian)
    {"kk", "kaz" },
    {"kl", "kal" },
    {"km", "khm" },
    {"kn", "kan" },
    {"ko", "kor" },
    {"ks", "kas" },
    {"ku", "kur" },
    {"ky", "kir" },
    {"la", "lat" },
    {"ln", "lin" },
    {"lo", "lao" },
    {"lt", "lit" },
    {"lv", "lav" },
    {"mg", "mlg" },
    {"mi", "mri" }, // was "mao" (Maori)
    {"mk", "mkd" }, // was "mac" (Macedonian)
    {"ml", "mlt" },
    {"mn", "mon" },
    {"mo", "mol" },
    {"mr", "mar" },
    {"ms", "msa" }, // was "may" (Malay)
    {"my", "mya" }, // was "bur" (Burmese)
    {"na", "nau" },
    {"ne", "nep" },
    {"nl", "nld" }, // was "dut" (Dutch; Flemish)
    {"no", "nor" },
    {"oc", "oci" },
    {"om", "orm" },
    {"or", "ori" },
    {"pa", "pan" },
    {"pl", "pol" },
    {"ps", "pus" },
    {"pt", "por" },
    {"qu", "que" },
    {"rm", "roh" },
    {"rn", "run" },
    {"ro", "ron" }, // was "rum" (Romanian; Moldavian; Moldovan)
    {"ru", "rus" },
    {"rw", "kin" },
    {"sa", "san" },
    {"sd", "snd" },
    {"sg", "sag" },
    {"sh", "scr" },
    {"si", "sin" },
    {"sk", "slk" }, // was "slo" (Slovak)
    {"sl", "slv" },
    {"sm", "smo" },
    {"sn", "sna" },
    {"so", "som" },
    {"sq", "sqi" }, // was "alb" (Albanian)
    {"sr", "srp" },
    {"ss", "ssw" },
    {"st", "sot" },
    {"su", "sun" },
    {"sv", "swe" },
    {"sw", "swa" },
    {"ta", "tam" },
    {"te", "tel" },
    {"tg", "tgk" },
    {"th", "tha" },
    {"ti", "tir" },
    {"tk", "tuk" },
    {"tl", "tgl" },
    {"tn", "tsn" },
    {"to", "tog" },
    {"tr", "tur" },
    {"ts", "tso" },
    {"tt", "tat" },
    {"tw", "twi" },
    {"ug", "uig" },
    {"uk", "ukr" },
    {"ur", "urd" },
    {"uz", "uzb" },
    {"vi", "vie" },
    {"vo", "vol" },
    {"wo", "wol" },
    {"xh", "xho" },
    {"yi", "yid" },
    {"yo", "yor" },
    {"za", "zha" },
    {"zh", "zho" }, // was "chi" (Chinese)
    {"zu", "zul" }
};

Original comment by CarlEd...@gmail.com on 13 Apr 2011 at 3:36

GoogleCodeExporter commented 9 years ago
The latest dev version of mp4box GPAC, 
http://gpac.wp.institut-telecom.fr/downloads/gpac-nightly-builds/, appears to 
incorporate the patch and fix the bug.  Hallelujah!

Original comment by CarlEd...@gmail.com on 11 Jun 2011 at 4:31