ratoaq2 / knowit

Know better your media files
MIT License
29 stars 5 forks source link

Mediainfo IETF BCP 47 #76

Closed TokerX closed 1 year ago

TokerX commented 1 year ago

Hi!

I've recently been using Bazarr and I noticed it doesn't read IETF BCP 47 language tags. They rely on libraries so they don't know themselves, but after some digging in the code I found knowit.

Looking at knowit's code I saw Language('Language', description='subtitle language') being called in mediainfo.py, but Language('language_ietf', 'language', description='subtitle language') in mkvmerge.py. Now I'm not too well-versed in Python, so the Language class has me somewhat confused with "if len(value) == 3", but either way I assume that unlike mkvmerge mediainfo does not return IETF BCP 47?

If so, that would be a shame. Because mediainfo supports it. And it is the recommended Matroska standard after all .

It does appear a bit trickier, a regular -f command gives this nonsense: Language : es-419 Language : Spanish (Latin America) Language : Spanish (Latin America) Language : es Language : spa Language : es-419

This command --Inform="Text;%Language/String4%" gives es-419, which is the desired output, but I assume you rely on JSON? Which makes it slightly weird since asking for JSON gives the following:

{ "@type": "Text", "@typeorder": "3", "StreamOrder": "4", "ID": "5", "UniqueID": "15183526623152000759", "Format": "UTF-8", "CodecID": "S_TEXT/UTF8", "Duration": "6262.622000000", "BitRate": "66", "FrameRate": "0.208", "FrameCount": "1301", "ElementCount": "1301", "StreamSize": "52296", "Language": "es-419", "Default": "No", "Forced": "No" }

Only one language key and it is the right one.

So in order to not have to dig any deeper into the code, and probably come up empty-handed anyway, could you either confirm or deny if mediainfo within knowit uses IETF BCP 47? And if the answer is negative, could it be implemented?

ratoaq2 commented 1 year ago

The issue here is that mediainfo doesn't extract that info, at least that was the case when I implemented it. If that's still true, the best way is to open an issue on their end in other to make that info available.

Mkvmerge on the other side, provides that information.

But when I get some time, I'll check if that has indeed changed for newer versions of mediainfo. If so, that should be an easy fix

ratoaq2 commented 1 year ago

One thing to note is es-419 is not supported by babelfish, as I pointed earlier this year:

https://github.com/Diaoul/babelfish/issues/43

TokerX commented 1 year ago

I tested knowit myself with Bazarr's code, it actually returns es-419 and that gets interpreted correctly, it's zh-Hans/Hant/etc that also gets returned correctly. Don't quite remember what the output was but something like "Language" I think. I suppose that's a python object? So it gets interpreted the wrong way or something.

And yeah I found that post lol.

ratoaq2 commented 1 year ago

Just tested locally the latest version and knowit with mediainfo is already parsing IETF BCP 47 correctly:

image