Open RReverser opened 3 years ago
IIUC, currently C2Rust is using a single Relooper algorithm for all control flow (?), but given how common this pattern is, perhaps it's worth special-casing it and solving separately?
We prefer to keep the translator as dumb as possible, and do cleanups and handle corner cases in c2rust-refactor, especially when those cleanups are non-trivial. It sounds like having a second algorithm in parallel with Relooper would be a massive change and maintenance burden, which is why I'd prefer if this were just a c2rust-refactor pass.
Alternatively, if this could be implemented as a small tweak to Relooper instead, that would be fine too. Relooper was originally designed for JavaScript and its output reflects that. It might be possible to extend it to emit more Rust-specific advanced control flow.
especially when those cleanups are non-trivial
Hm, it feels like the opposite - because those cleanups are non-trivial, they're quite hard to model via refactor command (unless I'm missing a command that could detect the pattern above)?
By the time this output is emitted, it seems that important information about the original control flow is already lost and rather hard to recover at the stage when branches were already merged into a single intertwined loop.
Alternatively, if this could be implemented as a small tweak to Relooper instead, that would be fine too. Relooper was originally designed for JavaScript and its output reflects that. It might be possible to extend it to emit more Rust-specific advanced control flow.
Yeah, I'm aware of Relooper, but it was mostly designed to emit asm.js which used to be quite restricted. Even for regular JS, the output like above is in fact possible, as JS also allows labeled breaks.
UPD: Alon Zakai pointed me to a PR in Relooper back in 2016 that improved cases like this: https://github.com/WebAssembly/binaryen/pull/648
Quotes:
In particular, the old Relooper used to add "label" uses for far-off targets in some cases, unnecessarily, which added overhead. We optimized that away 6 years ago in Emscripten/Binaryen, and Cheerp effectively did the same around 2 years ago with their Stackifier.
(c) https://twitter.com/kripken/status/1371925262397362176
Interesting, yes, that "current_block" stuff does look like the sort of thing the old Relooper would emit. So maybe using the modern approach would help.
It's a small tweak to the Relooper, here is where it landed: https://github.com/WebAssembly/binaryen/pull/648
(c) https://twitter.com/kripken/status/1371927740547354625
Perhaps it's possible to port same improvements to C2Rust's Relooper implementation?
Ping - any thoughts on the above? In particular, are there any plans to update the Relooper algorithm to match upstream improvements?
Oh yes sorry, I was planning to respond to this but it fell through the cracks.
Unfortunately we (Immunant) don't have the time or resources to work on this right now, but we'll gladly take PRs and review them if anyone else submits some. I'll keep the issue open in case our situation changes.
Ok, fair enough. I don't have capacity to work on this currently either, but I'll rename the issue and leave it open as well.
I'll just note here that imho Relooper is the wrong algorithm in this case, since its goal is far removed from human-readable code generation, and the control-flow operations of WebAssembly are more restricted than in Rust (even if formally equivalent). Also, a good translation of gotos into proper labeled blocks & breaks must be performed during the transpilation, since once you translate gotos into state variables and matches, the information of control flow is essentially lost in the code and unreasonably hard to reconstruct. There are also QoL features which are entirely lost in translation: label names (C code would usually use well-named labels) and the ordering of CFG blocks (ideally the translated code would mirror the block ordering of C code, unless it was non-reducible).
In my opinion, the proper algorithm for C2Rust would be the Stackifier. It produces much more structured and readable CFGs.
If PRs are welcome, I could make one in a few weeks
@afetisov Note that the post you link is outdated as it compares the Stackifier algorithm to a much older Relooper version. The modern versions of both algorithms are very similar (details in that thread, but the 2017 Relooper has already addressed the limitations mentioned in that post). Both can produce structured and readable CFGs, and both tend to match the original source structure at least in reducible code (the main difference between them is in irreducible code, for which I'm not aware of a good comparison actually).
(But regardless I think a PR to update to either would probably be useful! Though I'm not a dev here.)
There's also more recent work in this vein in "Beyond Relooper": https://dl.acm.org/doi/abs/10.1145/3547621
C libraries often use
goto exit
/goto error
as a way to jump within a function to a place that can cleanup resources before exiting when some condition is unmet.Let me take this example (one of the simpler ones in the wild): https://github.com/kbarbary/sep/blob/d22bef88ded3c5b25f06584aa8a2bc931cad1826/ctest/test_image.c#L142-L212
After transpiling + reorganize definitions, the Rust code generated by c2rust looks like this:
As you can see, it tried to interleave loops and conditions with the goto, making the dataflow less recognisable as well as inserting
current_block
IDs.Instead, a much more natural output could use
#![feature(label_break_value)]
and output function like this:That particular feature is unstable, but C2Rust already outputs nightly-specific code. However, a stable variant is also possible with a minor change:
Either way, the output becomes a lot more readable, compiles to something a lot closer to the original, and easier to refactor by hand further into idiomatic Rust code as well.
IIUC, currently C2Rust is using a single Relooper algorithm for all control flow (?), but given how common this pattern is, perhaps it's worth special-casing it and solving separately?