Open zed opened 20 hours ago
Hey @zed Thanks for reporting this issue. The extensions list I use for this library is coming directly from GitHub https://github.com/github-linguist/linguist/blob/5a0c74277548122267d84283910abd5e0b89380e/lib/linguist/languages.yml
They don't seem to have added support for gettext .mo
(but .po
is supported) but I could add it for sure.
One big problem is that except from looking at the content, I'm not sure there is a way to differentiate between the two extensions. Do you have a suggestion that would help?
gettext's MO files are binary (they can be tested by the presence of zero byte). Motoko, Modelica .mo files are text (programming code) -- they can't contain zero bytes unless UTF-16, UTF-32 encodings are used (not sure how likely for Motoko, Modelica source code to use such encoding).
Another way is to test the first 4 bytes:
The first two words serve the identification of the file. The magic number will always signal GNU MO files. The number is stored in the byte order used when the MO file was generated, so the magic number really is two numbers: 0x950412de and 0xde120495. https://www.gnu.org/software/gettext/manual/html_node/MO-Files.html
>>> open('messages.mo', 'rb').read(4) in map(bytes.fromhex, ["95 04 12 de", "de 12 04 95"])
True
As I understand, the linguist project ignores binary files, therefore there is no ambiguity for it https://github.com/github-linguist/linguist/issues/2053
Describe the bug Localization files such as
mkdocs/themes/readthedocs/locales/tr/LC_MESSAGES/messages.mo
are reported as Modelica languagehttps://www.gnu.org/software/gettext/manual/html_node/Files.html
To Reproduce
Take any project that uses gettext for i18n. For example, a project that uses mkdocs to generate its docs:
Tech stack analyzer reports Modelica language is used:
but these are actually gettext's MO translation files:
Desktop: