Simple documentation support for making it easier to call into the Sail generated C model

martinberger commented 1 year ago

As I just suggested in the RISCV meeting: We often want to run simulations (typically performance models) for some micro-architecture, say implementing RISCV. Typically we don't want to implement the full ISA semantics in the micro-architectural simulation ourselves (for example we might be interested in the performance of a page-walk idea, but not in the cost of addition or shift-left). Instead we are lazy and use the Spike or QEMU or Sail model execute clauses to handle the state transitions that come from what are not interested in (like addition or shift-left). That way you not only reduce your workload, but are correct (up-to correctness of Spike or QEMU or Sail model) by construction.

With some restrictions on polymorphic functions and stability of types, Sail enables this already: you just compile your Sail model with the -c option switched on and you get C functions corresponding to the execute clauses in Sail (or, indeed all other functions in the Sail code). Of course the Sail-generated C model might represent processor state differently from the micro-architectural model. So before and after calling into the Sail/C execute clauses we need to translate between those different forms of processor state. But this is straightforward (from the POV of programming language semantics this is related to type-isomorphisms). So far so boring. The problem is that for the average simulation guy on the street calling into the Sail-generated C model is difficult because the exact nature of the Sail generated C model not documented in a way that is easy to get into. Can we simplify this so the average simulation guy on the street can do this without caring much about Sail?

Here is the question / suggestion: we can produce produce, ideally automatically, a simple document (maybe just a single grepable text / HTML or ASCIIdoc) that simply lists all such functions in the generated C model with C type signature. For example if the Sail model has the function

val giraffe : unit -> int
function giraffe ()  = { ... }

then the generated C model will have something like

void zgiraffe(sail_int *rop, unit);
void zgiraffe(sail_int *zcbz30, unit zgsz30) { ... }

And that should be in the generated document (maybe with a link to the actual C code, but even that is not really necessary). Immediately it would be much easier for others to interact with the generated C model. If this was a lot of work, then it would probably not be worth the effort, I think this should be easy for two reasons.

The in-progress Sail-> ASCIIdoc tool needs to do some (but not all) of this already.
The Sail-> JIB translation keeps source information.

This is not a 100% Sail problem, because the RISCV model makes some contingent choices. Once we had that, it would also be easy for others to build on this and e.g. generate stubs into other languages, like C++, Rust, Python etc.

Some variation of this might also be helpful for the ongoing pydrofoil https://github.com/pydrofoil/pydrofoil backend work.

Alasdair commented 1 year ago

Previously I tried something where you could mark a set of functions in the model as 'exported', which was done in an external JSON file so as to not require lots of annotations in the model itself, then these functions would be exported without any name mangling and with extra care to make sure types were preserved and consistent between runs. It would also generate a header file for those functions. The problem was it required incompatible changes to the C backend so nobody used it, as people were already depending on the undocumented existing output. I ended up dropping it as it wasn't worth maintaining for that reason (it took over a year before anyone noticed it gone).

At the moment we don't make any guarantees about whether a given Sail function will even appear in the C output (it can be inlined everywhere), what types it will have (and the types could in theory vary between the same model generated on different computers, if a constraint on a bitvector length is so complex we timeout on one but not the other), or even what it is called. Maintaining compatibility with the existing RISC-V emulator already places considerable constraints on what we can do, so I'm slightly worried about documenting behavior we don't provide any guarantees about.

martinberger commented 1 year ago

@Alasdair No need to guarantee that a Sail function will even occur in the generated C, or what types it has, or even what it is called. I still think it would be useful to have something really simple: for each C function that occurs in the generated C (by a run of the compiler), a map of some form (e.g. file and line number in Sail source to the Sail function the C implements). If you worried about guarantees, a suitable disclaimer couple also be generated.

This would make it easier for outsiders to think about how to (re)use the C-model in simulation.

abukharmeh commented 1 year ago

This maybe related to #141 ? On a side note, I wonder if there is a flag to not inline and not combine various clauses, simillar the behaviour of -O0 ?

rems-project / sail

Simple documentation support for making it easier to call into the Sail generated C model #215