An issue to coordinate file formats descriptions creation

KOLANICH commented 7 years ago

:hammer: Compound File Binary Format/Compound Document format/COM Structured Storage
- doc xsl
smart battery firmware
edax spc
Apache Lucene formats
Award/phoenix BIOS module 1
Windows 7+ Ribbon - @DarkShadow44 is working on it
Apache Avro

Converters needed:

bios nvram settings for motherboards (it worth to write a converter from existing formats)
IETF formats described in an rfc and other txt-based documents (I'm working on a converter, #30)

KOLANICH commented 5 years ago

Unimplemented spec request: open issue, tagged "spec request" (is:open label:"spec request") Implemented spec request: closed issue, tagged "spec request" (is:closed label:"spec request") Other issue: anything not tagged "spec request" (-label:"spec request")

You are completely right.

But for anything else I would just link people to the issue page itself, so that they can read the full information themselves.

It may be useful (though I am currently unsure in how to implement it without hitting API limits) to show users some statistics. I mean, for example 1 user registers his WiP impl, then a year passes and an another user sees it abandoned but not finished and continues development and registers an own WiP too (registering a WiP may be just posting a link to a WiP branch, the bot can check if the main metadata block in KSY has the same id as the one referenced in head message (though there is a problem with changing ids, it may make sense to to make the bot detect changes and update own database automatically) ). So we may need to show users 2 links to WiP and their update dates. The bot account subscribes to all the repos referenced as WiP and automatically updates the dates when it receives notifications about the WiP branches.

BTW we can go even further: check with the bot forks of KSF, subscribe them all, detect creation of new branches, if a branch adds a single ksy file, extract the metadata block and create an issue.

This seems like a lot of work with little benefit. Implementing and testing this bot will take a while, and the end result won't be very easy or intuitive to use.

I don't have any estimates on how much time and effort it would take before actually trying to implement it. And I definitely have no time in near (at least 3 month, maybe even a year) future for implementing it.

dgelessus commented 5 years ago

It might not even be necessary to "register" WIP implementations explicitly in the issues. It should be easy enough to automatically extract all repo links from an issue's comments and search for the ones that contain a .ksy file (whose metadata block has a matching title/ID). This doesn't solve the problem of how you can remove WIP implementations again, but I don't think that we'll need to do that very often (and if we do, we could implement a "special bot instruction" that says "disregard this link if you've seen it before").

BTW we can go even further: check with the bot forks of KSF, subscribe them all, detect creation of new branches, if a branch adds a single ksy file, extract the metadata block and create an issue.

That would be a bit too weird imho - I don't think it's a good idea to create issues on the behalf of other users unless they explicitly consent to it. Not everyone wants to have their WIP spec advertised on the official ksf issue tracker. I believe there's also something in the GitHub TOS saying that your bot isn't allowed to act on other people's repos if they haven't requested it - this might not technically apply here (since the bot would only act on the ksf repo) but it's too close for my liking.

I don't have any estimates on how much time and effort it would take before actually trying to implement it. And I definitely have no time in near (at least 3 month, maybe even a year) future for implementing it.

In that case I second what @GreyCat has already said above - we should do everything manually for now, and add automation later where it's necessary/useful. To be clear, I'm all for structuring the issues in a form that will be easier to read automatically in the future, but within reasonable bounds - in the end, the issue tracker is still mainly used by humans, and we can't realistically make everything machine-readable.

dgelessus commented 5 years ago

By the way, there are a few issues under kaitai_struct that suggest adding a new format to ksf (#164, #166, #167, #168). When this discussion is done, we should transfer these issues to the ksf issue tracker and update them to whatever format we decide on.

On that topic, are there any points that still need to be discussed, or are we ready to start the move?

GreyCat commented 5 years ago

@dgelessus Thanks for summarizing this up! I'll be moving these issues shortly.

KOLANICH commented 5 years ago

On that topic, are there any points that still need to be discussed, or are we ready to start the move?

I don't feel like we are ready. We need to brainstorm what else machine-readable info we need. I feel like we need a mechanism to tie KSC bugs to formats. I mean the following. Some formats are implemented inefficiently because of lack of KSC features. There are different workarounds. And we need a mechanism to track and fix them when time comes. There are several options:

refer the issues in kio/ks (kaitaio-io/kaitai_struct) from formats themselves in comments on the lines. Make the bot scanning the formats and adding the info to their issues in ksf issues (we don't have to store them in markdown, we have to add them once in MD to trigger the mention and then remove and add into yaml). GH shows mentions in the mentioned issues and when the issue is landed, the formats authors are mentioned and the formats are manually iterated and fixed.
implement a mechanism to store the workaround and a new impl in the format itself in a machine-readable way. Like:
```
seq:
- workaround:
    issue: ...
    new:
       ....
    old:
        ....
```
When the feature has landed, the bot replaces the workaround block with the new impl in all the formats. Then mentions authors to notify them to test the format.
a user just refers an kio/ks issue in ksf issue
a user just refers an kio/ks issue in a commit

We also may need a way to add a CI pipeline to specific merged formats. A gpg keypair is generated for a bot. A user encrypts an URI to trigger his CI pipeline and posts it. Bot decrypts it and adds into an own DB, then deletes the message (for non-bloating the issue) and adds into the top post a token to delete the trigger with author's nickname/GH id (nickname can be changed). The token can be posted by author in order to delete the trigger. In the case the issue is deleted, all the triggers are also deleted.

dgelessus commented 5 years ago

Some formats are implemented inefficiently because of lack of KSC features.

Right, it would be good to have a way to handle this, but I'm not sure that it should be in combination with the "new format issues" (at least not exclusively). Formats that are already merged into ksf might also have inefficient implementations that could be improved with new ksy language features.

refer the issues in kio/ks (kaitaio-io/kaitai_struct) from formats themselves in comments on the lines

I'm not sure what you mean here. Do you mean a comment in the YAML file that links to the kaitai_struct issue in question? Or do you mean a GitHub comment? GitHub line comments only work on diffs, which is not ideal - there's no way to see all comments on a file without looking through the entire version history.

implement a mechanism to store the workaround and a new impl in the format itself in a machine-readable way

I don't think this is a good idea. First of all, the suggested syntax is completely new and would require extra support in the compiler and the language spec. This sort of thing should be done using dash-prefixed attributes instead, so that existing tools will ignore the extra information by default.

Regardless of the syntax, I don't think this would work well in practice. You can't really write code that uses a feature that doesn't exist yet - the syntax, structure and behavior of the new feature will likely change before it's actually implemented, and since you have no way to test your code it's very likely to be buggy. And updating the code automatically with a bot doesn't make sense when you don't even know that the new code works - that's the sort of thing that should be done manually, so that you can test and fix the new implementation properly before committing the change.

I'm all for planning this migration and thinking of what we need in advance, but we can't think of every kind of bot automation and metadata beforehand. I think we should focus on the things that are necessary for the issue-based list to be useful, and more on things that apply to most/all formats than things that only apply to some of them. For example, while there are formats that would require (or benefit from) new KSY language features, many formats can already be implemented with the current KSY feature set (perhaps with minor workarounds). So I don't think this linking of new formats to KSY feature requests is necessary at the beginning, but it may be a useful improvement in the future.

KOLANICH commented 5 years ago

Do you mean a comment in the YAML file that links to the kaitai_struct issue in question?

I mean a yaml comment.

This sort of thing should be done using dash-prefixed attributes instead, so that existing tools will ignore the extra information by default.

Yes, it is definitely possible:

seq:
  - id: a
    -workaround:
       issue: ...
       new:
         .... # the full new content of the property/field/etc, except doc and doc-ref

Regardless of the syntax, I don't think this would work well in practice. You can't really write code that uses a feature that doesn't exist yet - the syntax, structure and behavior of the new feature will likely change before it's actually implemented, and since you have no way to test your code it's very likely to be buggy.

I agree. Most (but not all, for example #413 or #269 !) of the features implemented differ from the original vision of them. So it should be decided on per feature basis.

And updating the code automatically with a bot doesn't make sense when you don't even know that the new code works - that's the sort of thing that should be done manually, so that you can test and fix the new implementation properly before committing the change.

I hope that somewhen we will have tests, comparing serialized etalon trees to the ones produced by the code compiled from formats.

I'm all for planning this migration and thinking of what we need in advance, but we can't think of every kind of bot automation and metadata beforehand.

That's why I suggest 2 machine-readable blocks per an issue. The second one can be extended in future easily.

dgelessus commented 5 years ago

Since there haven't been any more comments in the last few days, should we make this official and write a CONTRIBUTING.md based on the current state of this discussion?

KOLANICH commented 5 years ago

Gonna create tomorrow.

kaitai-io / kaitai_struct

An issue to coordinate file formats descriptions creation #134