kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
3.93k stars 193 forks source link

Allow "doc" tag in more places #356

Closed julie-bsx closed 5 years ago

julie-bsx commented 6 years ago

There is no clear documentation on where the doc tag can be used. This may seem trivial, but large parsers can be a pain in the backside to read as KSC doesn't read my mind and document what I was thinking. So I have to add doc tags. Except I can't because they aren't valid in as many places as I wish they were.

arekbulski commented 6 years ago

A list of explicit examples would be useful, if you could.

julie-bsx commented 6 years ago

I'm unclear what you're asking for. At present the doc tag can only be located in a few very high level locations. For example, this works --

doc: |
  Here are a bunch of enums.
enums:
  animals:
    1: dogs
    2: cats
    3: birds
    4: programmers
  plants:
    1: trees
    2: flowers
    3: lettuce
    4: hr_representatives

This doesn't work --

enums:
  doc: |
    Animals you might find at the office.
  animals:
    <insert stuff from above>
  doc: |
    plants you might find in the lobby
  plants:
    <yeah, that stuff from above again>

This isn't so bad for types, because I can do something like

types:
  animal_type:
    doc: An animal
    seq:
      - id: legs
        type: b10 # Be sure to support millipedes.

but I can't add a doc tag anywhere near the legs attribute if I needed to explain that some animals (like snails ...) have 0 legs and that -1 will appear as the value 1023, etc.

Of course, this is just a silly and made-up set of examples because I can't tell you what I'm actually doing.

KOLANICH commented 6 years ago

You can always use comments. doc is for language-supported docstrings like in python.

julie-bsx commented 6 years ago

Comments aren't copied to the generated parser.

And there are languages other than Python and documentation tools other than docstrings. C++ for one. Doxygen for another.

KOLANICH commented 6 years ago

I guess these are 2 separate bugs. One about comments and one about supporting non-native docstrings.

julie-bsx commented 6 years ago

Prolly, though I sort of like the way some comments aren't copied over and others are. For example, I have an enum which decodes a 5 bit field where values in the range 00xxx are ignored. I have a comment which says I'm not doing anything with those values until my boss says what he thinks should be done (my vote is "ignore them"). I don't need that in the generated parser. But I would prefer that some blob of documentation be addable to each enum definition because the example I gave above doesn't work, and it's impossible to have a separate enums second as a workaround.

GreyCat commented 6 years ago

I believe there should be 2 distinct tasks:

  1. Document current situation with doc. It is indeed not trivial to understand where it is allowed to be used. User guide currently gives a very short tip on using it for attributes and there's a section of verbose enums. doc-ref and majority of meta fields are actually not documented anywhere.

  2. Think of enhancing docstring generation support. Namely, from what I've understood now, there's one request: adding doc to enum declaration level.

If I'm correct, then let's create 2 distinct issues and continue there.

julie-bsx commented 6 years ago

@GreyCat -

doc needs to be added everywhere it can be supported. I'm an avid supporter of more documentation, not less. Being as liberal with where doc can be used (which would include case statements as well) would be a good thing.

KOLANICH commented 6 years ago

adding doc to enum declaration level.

:-1: for that. It will require reengineering of enums (I'm not against that, I even vote for dropping old syntax, but it is likely not an option, but if it is, it's better to do it now, than later), otherwise doc would become an invalid name for a enum value, I don't like that.

KOLANICH commented 6 years ago
offtop >example, I have an enum which decodes a 5 bit field where values in the range 00xxx are ignored In this case `b1` and other bit-sized types should be used, not enums
GreyCat commented 6 years ago

@julie-bsx "Do everything" is not a good idea to set up a task. We need to be specific. There are no "case statements" in KS (actually, there aren't any "statements" in KS to start with), but if you're talking about switching types, i.e. stuff like:

id: foo
type:
  switch-on: bar
  cases:
    1: baz
    2: quz

then, again, we'll need to decide how exactly are we going to put it in, and what would be the resulting code generated for all target languages.

Given that ksy format is based on YAML, you can't really just pop doc elements here and there and expect it to be parsed properly. For example, this is not really a valid YAML (at least for our purposes):

enums:
  doc: |
    Animals you might find at the office.
  animals:
    <insert stuff from above>
  doc: |
    plants you might find in the lobby
  plants:
    <yeah, that stuff from above again>

At least for 3 reasons: (1) doc can be a valid enum name, (2) "enums" is an unordered map, there's no way to tell which doc "adheres" to which key, (3) given that we receive it as a map object, we'll be getting at most one doc element, so all the rest would be just lost during YAML parsing (or, if YAML parser would be strict enough, it will just reject everything as invalid YAML, even before compiler would be able to do anything).

julie-bsx commented 6 years ago

@GreyCat -

I understand that YAML has something that closely resembles a grammar and that there are limitations on what is possible.

For example, I did try

enums:
  animals:
    doc: |
      Animals you might find at the office.
    <insert stuff from above>
  plants:
    doc: |
      plants you might find in the lobby
    <yeah, that stuff from above again>

which is syntactically and contextually valid YAML. But KSC doesn't like that "doc" isn't an integer.

I apologize that my examples ... which I typed into the TextBox on this page ... weren't syntactically valid. But my request remains -- doc can be added in a crapton of places. In order to make the generated parsers more than just write-only code, they should be supported in as many places as syntactically valid so the next sucker to read the generated parser isn't stuck scratching their head.

GreyCat commented 6 years ago

@julie-bsx I've generated two tasks which resulted from this discussion. Please add more of them, if you see any places where adding docstring might be useful. You may be right that they "can be added in a crapton of places", but we need to go step-by-step, identifying these places and discussing what would be the best way to add doc / doc-ref to them, and what is expected to be generated.

arekbulski commented 6 years ago

For the record, Construct just added * operator. Docs can be added to any field. https://construct.readthedocs.io/en/latest/advanced.html#documenting-fields

GreyCat commented 5 years ago

Looks like no more idea for adding doc tags were generated, and discussion stopped so far. Closing this one. If anyone would want to add more doc tags or anything like that, please file specific feature requests.