Closed GoogleCodeExporter closed 9 years ago
could you provide a file with a stream specified as being french? I'm not sure
how
to create one, but I wouldn't mind seeing if I could dig up the bug.
Original comment by kid...@gmail.com
on 3 Apr 2010 at 12:33
Sorry for the delay, but it took me a while until I ran across this bug again
and I
had a chance to generate a small example file. Have a look at the attached
file:
It was freshly generated with mp4box and has the following info:
% mp4box -info a.mp4 >before.txt
% mp4file --optimize a.mp4 [Just an example--all the mp4v2 tools which rewrite
the
file seem to have the same effect]
% mp4box -info a.mp4 >after.txt
Note the difference in track 2 & 3 which were German (yes, German is affected
just
like French, but English, Japanese and Chinese aren't) audio tracks before and
have
undetermined language after and track 8 which was a French subtitle track
before and
also has undetermined language afterwards.
Original comment by CarlEd...@gmail.com
on 31 May 2010 at 9:15
Attachments:
Sorry, the system is giving me problem with the relatively small uploads (even
the mp4
is only 2 MBytes), so I'll do them one by one.
Original comment by CarlEd...@gmail.com
on 31 May 2010 at 9:16
Attachments:
Thanks Carl--I'll look into it today.
Original comment by kid...@gmail.com
on 1 Jun 2010 at 4:49
Thank *you*, kidjan! Any help is much appreciated.
By the way and for what it is worth, I just downloaded and compiled (under
Windows 7
x64) the latest version (v386) and the problem is still fully reproducible.
Also, Spanish is one of the other languages *not* affected.
As the problem seems to be specific to the language used, is it possible that
it is
caused by a different set of language codes used by mp4box and mp4v2? Mp4box
will
dump its set of language codes when invoked with option "-languages"
Original comment by CarlEd...@gmail.com
on 3 Jun 2010 at 12:05
Yes, I think it is possible that it's some difference between mp4box and mp4v2,
although I'm still trying to isolate the bug. I can definitely reproduce it
(I'm
running Win7) quite easily.
Original comment by kid...@gmail.com
on 3 Jun 2010 at 4:26
I went through this a bit more carefully. In this function, mp4v2 parsed out
the
string "ger" as the language code:
void
MP4LanguageCodeProperty::Read( MP4File* file, uint32_t index )
{
uint16_t data = file->ReadBits( 16 );
char code[3];
code[0] = ((data & 0x7c00) >> 10) + 0x60;
code[1] = ((data & 0x03e0) >> 5) + 0x60;
code[2] = ((data & 0x001f) ) + 0x60;
SetValue( bmff::enumLanguageCode.toType( string( code, sizeof(code) )));
}
However, mp4v2 doesn't seem to contain all of the language codes per ISO 639-2
(see
http://www.loc.gov/standards/iso639-2/php/code_list.php); German can be either
"deu"
or "ger", but mp4v2 only contains "deu", which causes it to list the language
as some
undefined type. So I think maybe the problem here is the
array of language code/values in typebmff.cpp that lacks some of the duplicate
values.
Let me do a bit more research, and then I'll probably have a fix available.
Original comment by kid...@gmail.com
on 3 Jun 2010 at 5:05
Hmm, I take that back. After a bit more investigation, I found this changeset:
http://code.google.com/p/mp4v2/source/detail?r=247&path=/trunk/src/bmff/type.cpp
...per the MP4 specification (see
http://developer.apple.com/mac/library/documentation/QuickTime/QTFF/qtff.pdf),
MP4
files should be using ISO 639-2/T, not ISO 639-2/B, so the language code/values
in
MP4v2 are in fact correct.
It seems that mp4box must be using ISO 639-2/B? I believe you can dump the
language
codes with "-language" to see, but if German corresponds with "ger", then
mp4box is
using the wrong codes (i.e. ISO 639-2/B).
Original comment by kid...@gmail.com
on 3 Jun 2010 at 5:24
(or perhaps it isn't mp4box, but whatever tool was used to create the mp4 file)
Original comment by kid...@gmail.com
on 3 Jun 2010 at 5:30
Fair enough. But "interpret loosely, generate strictly" is still a good
software
design adage. If the specification says ISO 639-2/T, by all means that is what
mp4v2
should generate.
But where would be the harm in also accepting ISO 639-2/B and either preserving
it or
replace it by the proper codes (rather than replacing it by undetermined as
currently, which in turn is interpreted as "English" by VLC)? Given the
ubiquity of
mp4box as tool (either directly or under some GUI) for creating mp4 files, a
great
deal of good would be in that.
Original comment by CarlEd...@gmail.com
on 3 Jun 2010 at 5:33
The example I posted earlier was indeed created with mp4box.
Please find attached a list of the mp4box languages and abbreviations.
Original comment by CarlEd...@gmail.com
on 4 Jun 2010 at 12:31
Attachments:
When you use mp4box, is there some way you could use this flag to set the
languages
appropriate?
-lang [tkID=]LAN sets track language. LAN is the ISO 639-2 code (eng, und)
...for german, you'd use "deu".
Could you try this out?
Original comment by kid...@gmail.com
on 4 Jun 2010 at 12:53
Thanks, kidjan, that works! Mp4box clearly accepts both kinds of codes.
Still, as there are some software packages (including mp4box itself, when
generating
the language description for subtitles from idx files) which use the wrong
codes,
perhaps a patch that would allow mp4v2 to recognize the /B codes (perhaps with
a
warning), while writing the correct /T codes, would be useful?
Original comment by CarlEd...@gmail.com
on 4 Jun 2010 at 2:32
I'll think about it. I did discuss it with another developer (kona), and here
are
his comments:
"Since 639-2/T codes are required, I don't see the benefit of supporting mp4box
bugs.
What happens is someone always asks for the next step for mp4box
bug-compatibility...Let's assume conversion from 639-2/B was supported, the next
thing someone is going to ask is for it to be preserved. Then flags need to be
added
to the CLI tools. And so on."
For example, if I add code to recognize the /B codes and correct them to /T
codes,
then at some point it wouldn't be totally surprising for someone to ask for a
way to
not have the library "correct" it to deal with some other non-standard
extension, in
which case I'm in some hot water since I probably can't just back out the
previous
change that people are now using in production code.
If I did anything, I'd probably add a warning for any recognized /B code to
alert
people that they're using non-standard language extensions, but take zero actual
action. The other thing I'd do is follow up with mp4box and ask them why they
aren't
standards compliant (chances are, it's for the aforementioned reasons: someone
decided to accommodate, and now they're in a bit of an awkward spot).
Anyway, glad you have a fix. I'm closing this defect.
Original comment by kid...@gmail.com
on 4 Jun 2010 at 3:47
I can understand the concern about not propagating bugs into more software than
absolutely necessary. But I don't think that would be an issue here.
Mp4box is just fine interpreting the *correct* codes; it just sometimes
generates the
incorrect codes. There would be no advantage or call for mp4v2 ever generating
the
incorrect codes.
The only question is what mp4v2 should do when it encounters a common type of
slightly-out-spec file, that is a file with an 639-2/B code generated by
mp4box? Is
it better to change that incorrect code to "undetermined" as the current mp4v2
does?
Or is it better to change it to the correct code when that can be done easily
and
accurately?
This seems to me an easy call. "Interpret loosely, generate strictly."
Original comment by CarlEd...@gmail.com
on 4 Jun 2010 at 11:02
One hopefully last comment/correction:
Earlier I said that mp4box will happily accepts the correct /T codes with the
"-lang"
option. That is true as far as it goes. *However*, mp4box will rewrite the
correct
/T codes given on the command line into the incorrect /B codes in the mp4 file!
I
only just learned that while trying to manually fix a file with broken
designations--
running mp4file --optimize on the result changes them all back to
"Undetermined".
The bottom line is that I have (and know of) no way to create an mp4file with
correct
language designations. Mp4box won't do it, even if spoon-fed the correct ones.
Mp4v2 does not have any utility to set or edit these options.
I realize that these problems are not the fault of mp4v2's authors and
maintainers.
I've sent a number of extremely specific bug reports to whatever addresses
appeared
to be associated with the keepers of the mp4box code, but never gotten so much
as a
single reply. So my hopes of them fixing the problem seem low.
That may make my earlier point that mp4v2 should accept the incorrect /B codes
just
write the correct /T codes all the more urgent.
Original comment by CarlEd...@gmail.com
on 5 Jun 2010 at 12:28
Carl,
I'll think about it. I would really like to get in touch with mp4box's
maintainers to see if they can resolve it; I'm not inclined to change the
behavior of mp4v2 solely on the basis of some other programs errant, invalid
behavior (let's be frank: this is basically working around mp4box's incorrect
language implementation) without knowing that other program is at least moving
in the right direction.
I'll see if I can get a hold of them.
Thanks,
kidjan
Original comment by kid...@gmail.com
on 10 Jun 2010 at 6:15
CarlEdman wrote:
| The bottom line is that I have (and know of) no way to create an mp4file with
correct
| language designations. Mp4box won't do it, even if spoon-fed the correct
ones.
| Mp4v2 does not have any utility to set or edit these options.
Carl, have you tried using mp4track command-line tool from trunk? It lets one
set some generic per-track attributes. Here's an example.
***1. list all tracks. take note do you want to use ID or INDEX as your
mechanism to specify an individual track. we'll use INDEX to keep things simple
(the number between brackets []). in my sample.m4v here is track[1] listing:
mp4track --list sample.m4v
[...other tracks snipped...]
track[1] id=2
type = audio
enabled = true
inMovie = true
inPreview = false
layer = 0
alternateGroup = 0
volume = 1.0000
width = 0.00000000
height = 0.00000000
language = English
handlerName =
userDataName = Stereo
***2. let's change it to french and list; make sure to always list afterwards.
as a bogus language code will simply result in a "0" (undefined) value being
used without error. Wikipedia lists valid 639-2 codes (both /B and /T) here:
http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes
mp4track --track-index=1 --language 'fra' sample.m4v
mp4track --list sample.m4v
track[1] id=2
type = audio
enabled = true
inMovie = true
inPreview = false
layer = 0
alternateGroup = 0
volume = 1.0000
width = 0.00000000
height = 0.00000000
language = French
handlerName =
userDataName = Stereo
Original comment by Kona8l...@gmail.com
on 10 Jun 2010 at 10:42
Above example might be clearer if the same cli named option syntax was used.
Following are equivalent, 3rd one is clearest imo.
mp4track --track-index=1 --language 'fra' sample.m4v
mp4track --track-index=1 --language='fra' sample.m4v
mp4track --track-index=1 --language=fra sample.m4v
Original comment by Kona8l...@gmail.com
on 10 Jun 2010 at 10:44
Thanks again, Kona8lend! I didn't realize that the mp4track tool could set
languages.
By the way, it works just fine and both the mp4v2 tools and mp4box can *read*
the correct language codes.
Original comment by CarlEd...@gmail.com
on 11 Jun 2010 at 12:44
For what it is worth, I've sent a patch to anybody I could find with
responsibility for GPAC/mp4box which fixes the problem, but no response so far:
The current version of GPAC I just pulled from
https://github.com/golgol7777/gpac contains a subtle, but to the affected users
highly annoying bug.
To be specific: VobSub languages are coded using the ISO-639-2(B) codes, rather
than the ISO-639-2(T) codes as the standard specifies. (See
http://www.loc.gov/standards/iso639-2/php/code_list.php for the distinction).
For most languages the two codes are fortunately the same, but for several
widely used languages (including French, German, and Chinese) they are not.
This causes VobSub tracks coded with the incorrect (B) codes to be mis- or
un-identified in many standards-compliant MP4 tools (including VLC and mp4v2).
The problem can readily be fixed by replacing the 'lang_type lang_table' table
in VobSub.c with the corrected version attached below. All the changed lines
are marked with comments.
static lang_type lang_table[] =
{
{"--", "und" },
{"aa", "aar" },
{"ab", "abk" },
{"af", "afr" },
{"am", "amh" },
{"ar", "ara" },
{"as", "ast" },
{"ay", "aym" },
{"az", "aze" },
{"ba", "bak" },
{"be", "bel" },
{"bg", "bul" },
{"bh", "bih" },
{"bi", "bis" },
{"bn", "ben" },
{"bo", "bod" }, // was "tib" (Tibetan)
{"br", "bre" },
{"ca", "cat" },
{"cc", "und" },
{"co", "cos" },
{"cs", "ces" }, // was "cze" (Czech)
{"cy", "cym" }, // was "wel" (Welsh)
{"da", "dan" },
{"de", "deu" }, // was "ger" (German)
{"dz", "dzo" },
{"el", "ell" }, // was "gre" (Greek, Modern (1453-))
{"en", "eng" },
{"eo", "epo" },
{"es", "spa" },
{"et", "est" },
{"eu", "eus" }, // was "baq" (Basque)
{"fa", "fas" }, // was "per" (Persian)
{"fi", "fin" },
{"fj", "fij" },
{"fo", "fao" },
{"fr", "fra" }, // was "fre" (French)
{"fy", "fry" },
{"ga", "gle" },
{"gl", "glg" },
{"gn", "grn" },
{"gu", "guj" },
{"ha", "hau" },
{"he", "heb" },
{"hi", "hin" },
{"hr", "scr" },
{"hu", "hun" },
{"hy", "hye" }, // was "arm" (Armenian)
{"ia", "ina" },
{"id", "ind" },
{"ik", "ipk" },
{"is", "isl" }, // was "ice" (Icelandic)
{"it", "ita" },
{"iu", "iku" },
{"ja", "jpn" },
{"jv", "jav" },
{"ka", "kat" }, // was "geo" (Georgian)
{"kk", "kaz" },
{"kl", "kal" },
{"km", "khm" },
{"kn", "kan" },
{"ko", "kor" },
{"ks", "kas" },
{"ku", "kur" },
{"ky", "kir" },
{"la", "lat" },
{"ln", "lin" },
{"lo", "lao" },
{"lt", "lit" },
{"lv", "lav" },
{"mg", "mlg" },
{"mi", "mri" }, // was "mao" (Maori)
{"mk", "mkd" }, // was "mac" (Macedonian)
{"ml", "mlt" },
{"mn", "mon" },
{"mo", "mol" },
{"mr", "mar" },
{"ms", "msa" }, // was "may" (Malay)
{"my", "mya" }, // was "bur" (Burmese)
{"na", "nau" },
{"ne", "nep" },
{"nl", "nld" }, // was "dut" (Dutch; Flemish)
{"no", "nor" },
{"oc", "oci" },
{"om", "orm" },
{"or", "ori" },
{"pa", "pan" },
{"pl", "pol" },
{"ps", "pus" },
{"pt", "por" },
{"qu", "que" },
{"rm", "roh" },
{"rn", "run" },
{"ro", "ron" }, // was "rum" (Romanian; Moldavian; Moldovan)
{"ru", "rus" },
{"rw", "kin" },
{"sa", "san" },
{"sd", "snd" },
{"sg", "sag" },
{"sh", "scr" },
{"si", "sin" },
{"sk", "slk" }, // was "slo" (Slovak)
{"sl", "slv" },
{"sm", "smo" },
{"sn", "sna" },
{"so", "som" },
{"sq", "sqi" }, // was "alb" (Albanian)
{"sr", "srp" },
{"ss", "ssw" },
{"st", "sot" },
{"su", "sun" },
{"sv", "swe" },
{"sw", "swa" },
{"ta", "tam" },
{"te", "tel" },
{"tg", "tgk" },
{"th", "tha" },
{"ti", "tir" },
{"tk", "tuk" },
{"tl", "tgl" },
{"tn", "tsn" },
{"to", "tog" },
{"tr", "tur" },
{"ts", "tso" },
{"tt", "tat" },
{"tw", "twi" },
{"ug", "uig" },
{"uk", "ukr" },
{"ur", "urd" },
{"uz", "uzb" },
{"vi", "vie" },
{"vo", "vol" },
{"wo", "wol" },
{"xh", "xho" },
{"yi", "yid" },
{"yo", "yor" },
{"za", "zha" },
{"zh", "zho" }, // was "chi" (Chinese)
{"zu", "zul" }
};
Original comment by CarlEd...@gmail.com
on 13 Apr 2011 at 3:36
The latest dev version of mp4box GPAC,
http://gpac.wp.institut-telecom.fr/downloads/gpac-nightly-builds/, appears to
incorporate the patch and fix the bug. Hallelujah!
Original comment by CarlEd...@gmail.com
on 11 Jun 2011 at 4:31
Original issue reported on code.google.com by
CarlEd...@gmail.com
on 6 Mar 2010 at 3:45