informalsystems / atomkraft

Advanced fuzzing via Model Based Testing for Cosmos blockchains
Apache License 2.0
78 stars 10 forks source link

Support non-ITF traces #161

Open rnbguy opened 1 year ago

rnbguy commented 1 year ago

Right now, Atomkraft only processes ITF traces. It would be nice if Atomkraft would process some other format using custom parsers.

We can provide a simple parser based on YAML format that will enable beginners to write their own traces without using TLA+ or Apalache and try on Atomkraft directly.

Users may also write their own custom parsers and include them in their Atomkraft project.

This will also support traces generated by other methods, eg. fuzzing.

shonfeder commented 1 year ago

What is the value in introducing yet another trace format?

If YAML is particularly desired, and since YAML is already a superset of JSON, would it not suffice just to support reading YAML version into ITF?

rnbguy commented 1 year ago

It is not that I want to add a new format. YAML parser would be an example. It can be the ITF parser too. The main goal is to provide a python parser class for user-customized trace format. If they already have a trace, they shouldn't have to convert them to intermediary ITF files.

The use case came from a Djordje because he was producing non-ITF traces from fuzzing libraries. If they were transformed to ITF JSON for Atomkraft, they are way too big in size. Rather he should have directly passed his trace files to Atomkraft.

The way I see it, the canonical trace format for Atomkraft is a python list of python dictionaries mapped on strings corresponding to TLA+ variables. Atomkraft just parses an ITF file to that type. So what's the harm exposing a parser class if users want to use a customized format?

shonfeder commented 1 year ago

If they already have a trace, they shouldn't have to convert them to intermediary ITF files.

Why not? Isn't it reasonable to require a well-specified, and standardized input format rather than try to support every possible ad hoc format users will come up with?

The use case came from a Djordje because he was producing non-ITF traces from fuzzing libraries

Could you perhaps share the issue or discussion that documents his use case? It would probably help ensure we can tackle the core problems and that all stakeholders can be on the same page w/r/t context.

If they were transformed to ITF JSON for Atomkraft, they are way too big in size

Could you clarify what they are too big for? Are they too big for manual inspection, or too big to be transmitted via HTTP requests or to big to be parsed, or...? What format has he found that preserves all needed data but is much more space efficient? That would be something we should probably study!

what's the harm exposing a parser class if users want to use a customized format?

The harm I see has nothing to do with classes, but with introducing yet another customized trace format. This creates yet more fragmentation in our tooling and yet more duplicated work.

the canonical trace format for Atomkraft is a python list of python dictionaries mapped on strings corresponding to TLA+ variables

Why would we not agree on a single cannonical format to share between Apalache and AtomKraft? Introducing a python-specific canonical format that only pertains to one tool seems like it's asking for needless conversion plumbing down the road, and goes counter to our aims of improving integration iiuc.

Could i ask why you cannot adopt ITF as your canonical format? As per Apalache ADR015 the intended purpose of ITF was to provide:

a very simple format that does not require any knowledge of TLA+ and can be easily integrated into other tools.

If ITF is unsuitable for this purpose, could we not coordinate on improving it to serve this purpose, rather than introducing yet another ad-hoc format? That is, rather that AtomKraft fixing its own idiosyncratic format as canonical, could we not agree on a shared conanical format? To that end, can we not start with the trace format that we have already invested considerable time in designing, specifying, and supporting, and evolve it to make it the best suited to its purpose?

shonfeder commented 1 year ago

It might make sense to discuss briefly in a sync, as I may be missing to much context :D

rnbguy commented 1 year ago

Hmm. After reading your replies, I am agreeing with you. I should have communicated with the Apalache team about this. Let's talk more in sync about whether we need to improve on the ITF.

shonfeder commented 1 year ago

Sounds good! One thing to foreground: if we're turning up defects or shortcomings in ITF, I think we should %100 evolve and improve it (up to and including replacing it entirely if necessary :D ).

shonfeder commented 1 year ago

Iiuc, this is related to #150

konnov commented 1 year ago

I think the core of the discussion here is about the in-memory representation vs. the storage representation. For the storage representation, ITF is complete, though not compact. If some compression is needed, you could simply zip a json file.

What you seem to need in #150 is an in-memory representation that is:

It make sense to me to have internal or external-facing data structures that are tuned for efficiency. However, it has nothing to do with another (file) format.