kaitai-io / kaitai_struct_formats

Kaitai Struct: library of binary file formats (.ksy)
http://formats.kaitai.io
699 stars 200 forks source link

Zeno IMproved #153

Open KOLANICH opened 5 years ago

KOLANICH commented 5 years ago
meta:
  id: zim
  title: "(Open) Zeno IMproved"
  application: 
    - Kiwix
    - zimlib
  file-extension: zim
  xref:
    wikidata: Q784695
  license: CC-BY-SA-3.0
  encoding: utf-8
  endian: le
doc: |
  A file format to store encyclopaedias of articles written in MediaWiki markup language. 
  Files for test: https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/

doc-ref:
  - https://www.openzim.org/wiki/ZIM_file_format
  - https://wiki.openzim.org/wiki/OpenZIM
WiP:
  - https://github.com/KOLANICH/kaitai_struct_formats/blob/OpenZIM/media/openzim.ksy
GreyCat commented 5 years ago

ecyclopaedias => encyclopaedias?

KOLANICH commented 5 years ago

Good catch! Fixed (currently in this issue only), thanks.

generalmimon commented 3 years ago

encyclopaedias

This is a British spelling. See https://en.wikipedia.org/wiki/Encyclopedia:

An encyclopedia or encyclopaedia (British English) is a reference work or (...)

and https://www.merriam-webster.com/dictionary/encyclopaedia:

encyclopaedia, encyclopaedic

Definition of encyclopaedia

chiefly British spellings of ENCYCLOPEDIA , ENCYCLOPEDIC

I thought that we prefer American spelling for KS identifiers, don't we? In fact, @KOLANICH suggested this himself in https://github.com/kaitai-io/kaitai_struct/issues/522#issuecomment-468059463:

One more thing. American spelling: meter, not metre.

KOLANICH commented 3 years ago

In fact, @KOLANICH suggested this himself in kaitai-io/kaitai_struct#522 (comment)

Please don't rip out of context. That suggestion was in context of units. Units are the things that must be uniform, and most of units libraries use American spelling for a meter (though some support both), so introducing there metre instead of meter would cause a need of additional remapping of that id too.

In identifiers it probably makes sense to use American spelling only when the word with variability was introduced by spec author and if an another part of the id is not already using British spelling. If the id was originally using British spelling it may make more sense to keep using it.

Also, doc is not an id and doesn't have such expectations as an id has. From searcheability point of view it may make sense to unify spelling, as long as not all search algos can handle that automatically. But I feel like encyclopaedia (and maybe even encyclopædia) looks cooler than encyclopedia, as naïve looks cooler than naive...

generalmimon commented 3 years ago

In identifiers it probably makes sense to use American spelling only when the word with variability was introduced by spec author and if an another part of the id is not already using British spelling. If the id was originally using British spelling it may make more sense to keep using it.

I don't think so. Quite often, it doesn't make sense to blindly and thoughtlessly follow the style and internal conventions of the reference spec (see examples in the style guide). We want to ensure consistency, predictability and intelligibility of the formats in KSF, so we should follow stable style and conventions for KSY specs rather than conventions of anything else. If a spec for one image format uses color, then what's the point of using colour in another one?

For illustration, here's how it looks if you don't set any conventions:

https://github.com/kaitai-io/kaitai_struct_formats/blob/c7658a660d8b0f071d740342afd4acc03f0476bf/image/icc_4.ksy#L32-L34

In an ideal world, same things should be called the same and different things differently, and not only within one specification, but also across all. That's why it makes sense to normalize num_/len_/ofs_ prefixes (since every text specification has its own specific concept how these fields should be called), spelling, etc.

Also, doc is not an id and doesn't have such expectations as an id has.

Perhaps. But that does not justify using different spelling in ids and doc.

But I feel like encyclopaedia (and maybe even encyclopædia) looks cooler than encyclopedia, as naïve looks cooler than naive...

Yes, that's exactly the problem - it's cool, stylish and definitely not monotonous. That's why it distracts attention - you're attracted by the unconventional spelling and not focused on the actual content.

KOLANICH commented 3 years ago

If a spec for one image format uses color, then what's the point of using colour in another one?

i.e. when software name has the word colour in it and we include the software name into a spec id.

icc_4.ksy

Clearly a bug - the wording is used inconsistently without a proper justification to do so.

Perhaps. But that does not justify using different spelling in ids and doc.

Not quite. I feel like the pieces that are quotes should retain the original spelling. Especially if they are quotes within text literals marked as such, i.e. with >.

I.e. in software with the token colour in name

meta:
  id: ..._colour_...
  title: ... Color Format
  application: ... Colour ...

doc: |
  ... color ...
  The doc says:
  > ... colour ...

seq:
  ...
  - id: color
  ...