This may be superior to clang-query

real-or-random commented 2 years ago

Writing a plugin is a very nice and interesting idea. There's an old open secp256k1 PR that does a very basic version of this here using clang-query: https://github.com/bitcoin-core/secp256k1/pull/833 I imagine this here is much more powerful and cleaner. On the other hand, I don't know if the LLVM API breaks often.

(Not an issue per se, sorry, I just wanted to express my interest. ;) Feel free to close.)

theuni commented 2 years ago

Cool, thanks, I'll read through the issue!

I'm glad you said something, I've been assuming c++ all along, but I guess there's no need to rule out C.

Also, check out https://github.com/theuni/bitcoin-tidy which is actually far more useful as it's able to do transforms, not just compile-time warnings.

I'm mostly imagining the clang plugins here will be useful for CI as they're used in the compilation process.

real-or-random commented 2 years ago

I'm glad you said something, I've been assuming c++ all along, but I guess there's no need to rule out C.

Yeah in particular C seems to be somewhat delicate (at least in clang-query, not sure about the code here). In the PR I mention this LLVM bug here: https://github.com/llvm/llvm-project/issues/47223

theuni commented 2 years ago

I'm glad you said something, I've been assuming c++ all along, but I guess there's no need to rule out C.

Yeah in particular C seems to be somewhat delicate (at least in clang-query, not sure about the code here). In the PR I mention this LLVM bug here: llvm/llvm-project#47223

Heh, apparently I wasn't alone in treating C as an after-thought :p

theuni commented 2 years ago

@real-or-random Btw, looking at clang plugins from libsecp's pov, you might be especially interested in the notion of embedded DSL's. See the slides here (starting around 9): https://llvm.org/devmtg/2020-09/slides/Finkel-Changing_Everything_With_Clang_Plugins.pdf

If anything strikes you as interesting, please poke me about it. For now I'm just hacking around on plugins to explore and better understand what they can do and how to fit them into Core's workflow, so I'd be happy to shift to something that even has a slight chance of being real-world useful.

real-or-random commented 2 years ago

If anything strikes you as interesting, please poke me about it. For now I'm just hacking around on plugins to explore and better understand what they can do and how to fit them into Core's workflow, so I'd be happy to shift to something that even has a slight chance of being real-world useful.

One thing that comes to my mind is https://github.com/bitcoin-core/secp256k1/issues/1001. Not sure if you're familiar with our finite field code. Here, a field element has a "magnitude", which essentially tracks how far the integer it is from being fully reduced (mod the field size), see https://github.com/bitcoin-core/secp256k1/blob/master/src/field.h#L10. The idea is that arithmetic operations increase magnitude (e.g., magnitude(fe_add(a,b)) = magnitude(a) + magnitude(b)), and thus sometimes we need to call normalize to reduce it again, in order to ensure that it doesn't hit the maximum and the integer overflows. Also, some operations have specific requirements, e.g, eq needs magnitude-1 inputs.. The idea of this is that magnitude should be statically implied, i.e., just implied by the AST without looking at the actual values of field elements (we're just fully there yet.) This is essentially something that can be tracked via a type system but C's types are too weak to express it, so maybe a plugin could do it. (No idea how much work this will be.)

This is just a random idea and nothing we really need. We currently verify the magnitude at run time in the tests if VERIFY is defined. This should be equivalent to static checking under the assumption that the tests hit all code paths.

theuni commented 2 years ago

Ok, I'm going to have to spend some time looking over and understanding all that, but my hot take based on a quick pass:

It sounds like some runtime code could be made static if some functions have already been provably called. If I've got that right, it's a lot like this example. There, each function is analyzed to see if it needs call_super(), and warns if required but missing. Here we could so something similar where normalize is a magic function that resets the warning state. Afaik that is trivial to accomplish.

As for the magnitude-1 thing, you lost me there :)

real-or-random commented 2 years ago

Ok, yeah, feel free to have a look but this was just the first idea that came to my mind.

It may be instructive to look at the function docs in field.h but even more so to look at our current "VERIFY" runtime code:

Here's an example where we track magnitude explicitly in the fe_add function (implementing what can be found in the docs: magnitude of result is sum of the magnitudes of the two inputs): https://github.com/bitcoin-core/secp256k1/blob/master/src/field_10x26_impl.h#L478 One line below we call fe_verify which then checks that the magnitude is below 32, which is our limit for the 10x26 code. All this tracking and verifying happens only in "test mode" (when VERIFY is defined) but it would certainly be cleaner to do it in the compiler (which would also make sure that our test code doesn't interfere with optimizations and hides bugs etc.)

If magnitude of a value would become too large by an arithmetic operation, we need reduce it before doing the arithmetic by calling fe_normalize or fe_normalize_weak, which resets magnitude to 1: https://github.com/bitcoin-core/secp256k1/blob/master/src/field_10x26_impl.h#L121

The code starting here is an example of this in action: https://github.com/bitcoin-core/secp256k1/blob/master/src/group_impl.h#L549 Here we try to the track magnitude in parenthesis in the code comments. Note that here we call fe_normalize_weak on u1 and s1 in the beginning because the corresponding values are given to the function as inputs, and don't assume anything about their magnitude. So in order to be able to do arithmetic, we'll need to call normalize first.

edit: Now that I think about this more carefully, this may be rather difficult or really require an entire type system (and lots of type annotations from the user). For example, how would one handle loops? All our loops are bounded but the compiler may not know a reasonable bound etc.

real-or-random commented 2 years ago

Here's another random thought: Have looked into similar projects? I only recently learned about CodeQL and semgrep which I think have similar goals than clang-query but seem more mature and sophisticated. But so far, I couldn't find time to play around with them.

theuni commented 2 years ago

I haven't!

I chose the clang machinery because we rely on its internals for building bitcoind, so we already "trust" it to be correct.

But you're right, that shouldn't preclude experimenting with other tools. I'll have a look at those, thanks!

theuni / bitcoin-core-clang-plugin

This may be superior to clang-query #1