kaitai-io / kaitai_struct_formats

Kaitai Struct: library of binary file formats (.ksy)
http://formats.kaitai.io
701 stars 202 forks source link

`file`/libmagic binary magic pattern database #361

Open generalmimon opened 3 years ago

generalmimon commented 3 years ago
meta:
  id: magic_binary_database
  title: "`file`/libmagic binary magic pattern database"
  application:
    - "`file` command"
    - libmagic
  file-extension: mgc
  tags:
    - unix
doc: |
  A compiled binary database of "magic patterns" that the `file` command
  (and the underlying libmagic library) uses to identify the type of a file.

  `file`/libmagic's default magic pattern database is usually located at
  `/usr/share/misc/magic.mgc` or `/usr/share/file/magic.mgc`
  (the path can be seen by running `file --version`).

  The `file` command's `-C` option can be used to compile one or multiple
  text-based magic pattern source files into a binary magic pattern database.
doc-ref:
  - https://manpages.ubuntu.com/manpages/bionic/man5/magic.5.html
  - https://github.com/file/file/blob/master/src/apprentice.c
dgelessus commented 3 years ago

Made some edits:

generalmimon commented 3 years ago
  • Replaced linux tag with unix, because it's not Linux-specific in any way.

Perhaps, I used linux only because it is an established tag that KSF site recognizes and includes the format into the https://formats.kaitai.io/#:~:text=GNU/Linux-specific category. I didn't think of factual correctness too much. I am a Windows user and the file utility definitely seems to me to be more connected with the Linux world 😁

The downside of using unix tag right now is that it won't be recognized by the KSF site. Maybe we can rename the linux tag to unix, now when the set of /meta/tags values is still pretty recent (it was introduced by @GreyCat a few months ago IIRC)?

dgelessus commented 3 years ago

Maybe we can rename the linux tag to unix

Hm, I'm wondering if it's worth keeping the separate linux tag in addition to unix. Of the specs currently tagged as linux, many are indeed Linux-specific, or at least associated much more with Linux than other Unixes (e. g. btrfs_stream, cramfs, ext2, luks, lvm2, systemd_journal), so a separate linux tag would still have some use.

I can understand that it would be simpler to have a single unix tag and also group Linux-specific things under that, especially because then you don't have to decide what counts as really Linux-related or just Unix-related. For example glibc technically supports more than just the Linux kernel (e. g. Hurd), but in practice it's only really used on Linux as far as I know - so do glibc-related formats get tagged as linux or unix? But if you go by how the formats are used in practice, I think the distinction is usually clear (e. g. glibc-related formats would be linux and not unix).