llvm / circt

Circuit IR Compilers and Tools
https://circt.org
Other
1.64k stars 285 forks source link

[RFC][Sim] Add format string type and format specifier ops #7208

Closed fzi-hielscher closed 2 months ago

fzi-hielscher commented 3 months ago

The motivation of this PR is to get a frontend/backend agnostic representation of "printf-like" format strings into the core dialects. These should allow us to specify the emission of messages during simulation, which include formatted representations of runtime values.

It adds the !sim.fstring type to the IR, which represents a sequence of format tokens. These tokens can either be string literals or the combination of a (SSA) value and a format specifier. It also adds the usual suspects of format specifiers as individual operations: Binary, signed/unsigned decimal and hexadecimal. To make the transition of the current direct FIRRTL-to-SV lowering as simple (and compatible) as possible, the specifics of those heavily lean on the way formatting works in SystemVerilog. Currently formatting is limited to integer values. For the details I'd ask you to refer to the documentation in the TableGen source.

I deliberately decided against a representation where substitutions are embedded within a string, e.g., "Foo = %x". While this imposes some efforts on the frontend lowerings to tokenize their format strings into separate operations , I think this will help us to avoid confusion between different fronend and backend formats. And more importantly, it makes processing substitutions in an "MLIRish way" easier.

These additions are intended as a first step towards a FIRRTL->Sim->Arc/LLVM-IR lowering pipeline forprintf statements.
I've got a functional prototype of this integration over in my repository, which can hopefully give you an impression on how this could look like in the end.

I have to point out that this is effectively a competing implementation to the existing print operation in the Verif dialect, added in #5616. I hope this does not disqualify me from the title of friendly neighbourhood compiler engineer. I'm currently not aware of any (publicly available) lowering to FormatVerilogStringOp, so I cannot judge how hard it would be to transition this to my implementation. But I am somewhat confident that we'll find a way to unify these.

I'd be happy to hear your opinions and suggestions on this approach.

fabianschuiki commented 2 months ago

I love the idea of actually breaking the formatting string up into separate ops that then get concatenated. Especially since it makes the format string syntax itself a frontend language concern, which gets lowered to a bunch of ops, which can then get emitted appropriately by the backend. The FIRRTL printing ops basically accept "whatever SV format string you want". With this, we could actually specify formatting strings in FIRRTL that are parsed and lowered.

This feels a lot like what Rust did with its format!(...) macro, where the compiler actually splits the format string up into something like LLVM's Twine. Love it :smiley:

fzi-hielscher commented 2 months ago

Thanks a lot, @fabianschuiki.

I'm still hesitant to push (partially) redundant infrastructure without being able to sketch out a clear path to unification. Maybe @mortbopet or @teqdruid could briefly chime in on their use of the current verif print operations? Basically, how important is it to be able to pass format strings verbatim from the middle-end to the SV back-end? Can we reasonably limit ourselves to a subset of the Verilog format specifiers?

darthscsi commented 2 months ago

@prithayan is sim.printf used internally?

teqdruid commented 2 months ago

I don't think we use the verif dialect.

prithayan commented 2 months ago

@prithayan is sim.printf used internally?

@darthscsi , We don't use the verif.print op internally. The format string type looks good, would be useful for the sim.plusargs also.

fzi-hielscher commented 2 months ago

Thanks for the feedback. Since there are apparently no known users of the verif.print operation, I'd opt for the time-tested method of ripping them out and see if someone complains.

To be honest, using format strings for parsing (like sim.plusargs) was not on my radar. I'm not so optimistic that this design can be used for scanf-like operations. After all we cannot just reverse the dataflow of the formatting token operations. Then again, sim.plusargs looks to me like a very specific behavior that does not make much sense outside of a Verilog context. So, does it really make sense to try and abstract over this?

To make some progress here I would merge this PR in the coming hours, unless anyone has any objections. Fell free to leave a post merge review or voice your concerns in the inevitable follow-up PRs.

fabianschuiki commented 2 months ago

Really cool! 🥳

Interesting thought about scanf-like parsing of strings. This feels like you'd want a complementary set of parsing operations, and then either the frontend emits those directly, or maybe there's a pass that transposes the string concats into sequences of parses? Or treats them like a regex? Not entirely sure. The parsing/scanf is also harder because we don't have the liberty to assemble the string separately and pass it into a Verilog construct, like we have for printing. Although, the plusargs could theoretically accept any string and the parsing could then be done separately…