Syntax/Semantics for the system manifest

podhrmic commented 9 months ago

There was a brief discussion at the summit that maybe using XML for the system description is not the best (hjson anyone?:-) ). Either way, there should be proper grammar (EBNF?) and associated rules for the manifest language.

Why?

then switching between different manifest languages should be trivial
generating the manifest from other tools should be simpler (think SysMLv2/AADL -> microkit)
with proper rules we can ensure the manifest is well formed (I know you are already doing a number of checks, but we should be explicit about it)

Ivan-Velickovic commented 9 months ago

In my opinion I think all the system description formats are going to be equally as bad. I have not noticed significant friction in using the current format except for large systems. However, I think any format would struggle when dealing with large systems, which is why a format that can be auto-generated has been chosen.

Regarding auto-generating the system description XML, yes we will need some kind of description of the rules and expected format for each kind of element. It is something I have started to think about. I have not done something like this before though, so will have to look into it thoroughly first. It's a relatively low-priority for now, but is definitely something I want to do in the future.

Ivan-Velickovic commented 9 months ago

Something I forgot to mention at the Summit is that because of the system description to CapDL translation validation, in the future it will not matter whether we trust the XML parser or not, as we have a proof that the CapDL outputted is valid.

gernotheiser commented 9 months ago

Regarding auto-generating the system description XML, yes we will need some kind of description of the rules and expected format for each kind of element. It is something I have started to think about.

Given that we have an Isabelle import, doesn't this already exist?

Ivan-Velickovic commented 9 months ago

Regarding auto-generating the system description XML, yes we will need some kind of description of the rules and expected format for each kind of element. It is something I have started to think about.

Given that we have an Isabelle import, doesn't this already exist?

It may, but it needs to be in an accessible format which I am not sure if it is. As this is something that needs to be consumed by people that make tools to auto-generate/abstract over the XML. I am sure some kind of standard already exists for doing this for XML, whether this is already done for the translation validation proof work is something I need to double check.

lsf37 commented 9 months ago

Something I forgot to mention at the Summit is that because of the system description to CapDL translation validation, in the future it will not matter whether we trust the XML parser or not, as we have a proof that the CapDL outputted is valid.

This is not correct -- the XML is the top-level specification that the capDL output and system image is proved against. If the XML parser consistently produces an unintended interpretation of the input and it is used as frontend for all steps then all tools will generate that unintended system and prove that it is indeed that unintended system. That is, the XML parser does remain trusted.

What the proof will catch are parser crashes or malformed specs, but it can't make sure that the system is the one that you want.

The only way to do that using proofs is to prove properties about the XML spec using the same proof artefacts that are used in the rest (e.g. parse XML to Isabelle, use the Isabelle representation for the proof). If you then prove for instance that component A can never talk to component B without going through C, then that property will be true about the final system, regardless of XML parser bugs. If you parse XML into some other tool to do that proof, then it becomes trusted again, because the formal proof chain is broken (unless there is some other way to hook up that tool to the Isabelle proof artefacts).

This is not specific to XML of course, any top-level system description language has that problem, that's why it's important to specify syntax + semantics of it clearly.

Ivan-Velickovic commented 9 months ago

This is not correct -- the XML is the top-level specification that the capDL output and system image is proved against. If the XML parser consistently produces an unintended interpretation of the input and it is used as frontend for all steps then all tools will generate that unintended system and prove that it is indeed that unintended system. That is, the XML parser does remain trusted.

Apologies, I misunderstood exactly what the translation validation was doing, thank you for correcting me.

What the proof will catch are parser crashes or malformed specs, but it can't make sure that the system is the one that you want.

The only way to do that using proofs is to prove properties about the XML spec using the same proof artefacts that are used in the rest (e.g. parse XML to Isabelle, use the Isabelle representation for the proof). If you then prove for instance that component A can never talk to component B without going through C, then that property will be true about the final system, regardless of XML parser bugs. If you parse XML into some other tool to do that proof, then it becomes trusted again, because the formal proof chain is broken (unless there is some other way to hook up that tool to the Isabelle proof artefacts).

This is not specific to XML of course, any top-level system description language has that problem, that's why it's important to specify syntax + semantics of it clearly.

Understood. It seems we'll have to do something about this then, manually inspecting the capDL to ensure it describes the system you want is difficult and unrealistic for most.

podhrmic commented 9 months ago

Given that we have an Isabelle import, doesn't this already exist?

Does the Isabelle import let you export an XML schema? Based on the discussion here, it seems like this can be tackled in two steps:

XML (or JSON or some other) schema to validate each manifest before passing it to the microkit build tool
A list of well-formedness rules/judgements that describe the properties of the manifest (e.g. memory regions shall not overlap, unique IRQ assignment etc). I am not aware of a method that would allow you to generate implementation from these judgements, so you would still need to manually test those in the microkit build tool

I don't think there are too many rules, and you have probably most/all of them already implemented, so this should not be a big lift, but it would be good at least for documentation.

FYI we had pretty good experience parsing XML with https://crates.io/crates/serde-xml-rs - the downside is you cannot consume a XML schema directly by Rust, so there is some duplicate work. If we had a JSON schema, then generating Rust code from it is trivial. Not arguing for a particular solution, just adding context. I am not very familiar with Python so your mileage might vary there.

seL4 / microkit

Syntax/Semantics for the system manifest #70