rust-bitcoin / rust-secp256k1

Rust language bindings for Bitcoin secp256k1 library.
Creative Commons Zero v1.0 Universal
338 stars 255 forks source link

Wasm, Contexts, and Entropy #346

Open JeremyRubin opened 2 years ago

JeremyRubin commented 2 years ago

does signing context generate entropy on creation? testing now but this may be a big issue for using verification from a WASM module with no entropy because the get random number methods may panic.

For manually generated contexts, this happens only if you randomize them explicitly, see https://github.com/rust-bitcoin/rust-secp256k1/blob/48683d87c85ac28d5f2490c828979e7d8b9b874c/src/lib.rs#L354-L380.

For the global context, it depends on the enabled feature: The global-context feature randomizes the context automatically and depends on rand, and the global-context-less-secure feature gives you a context that is not randomized (and can't be randomized).

Originally posted by @real-or-random in https://github.com/rust-bitcoin/rust-secp256k1/pull/342#issuecomment-991615743

We should probably think through what it means to use any of these contexts in a WASM context! Especially in wasm32-unknown-unknown you have no syscalls or way of getting entropy, so this could be problematic.

Fully isolated WASM is really great to provide first-class support for for a myriad of reasons, but we'd need to think through carefully exactly what we do. Maybe we can detect if we're in wasm and disable some targets if we don't have symbols for getting entropy linked? Not sure :)

real-or-random commented 2 years ago

Hm, I don't think we should disable anything. This all depends on the threat model of the application and it's hard for a library to know this. (It's already hard for an application to make a guess about the threat model of the user...)

So randomization is defense in depth against timing attacks and it may provide some resistance against other side-channel attacks (electromagnetic -- but that's anyway hard). How much do you care about this in WASM? We can't tell as a library. Anyway, can WASM code even be made resistant against timing leaks? Apparently yes. So okay, without entropy, we lose defense in depth but there's no reason to believe that we lose security.

Especially in wasm32-unknown-unknown you have no syscalls or way of getting entropy, so this could be problematic.

You could try to hash your secrets at least. How do you get your secrets into WASM? (I mean you can't generate them in WASM without external entropy?)

edit: Please tell me if I seem to assume anything which is wrong. To be honest, I really have no idea how WASM works.

elichai commented 2 years ago

If you know your default implementation of getrandom either sucks or panics, then you should make your own global context (or use a local one) and randomize it using whatever randomness you do have (and you have to have some kind of randomness as you have a private key).

I'd rather we won't change anything in the context until https://github.com/bitcoin-core/secp256k1/pull/988 is done and then we can rehaul our API

apoelstra commented 2 years ago

It is really unfortunate that rand methods would panic. This seems like an abuse of panic and I don't see any reasonable way we can detect it and recover.

Are there replacements for rand out there which are widely supported?

TheBlueMatt commented 2 years ago

This is the exact reason for global-context-less-secure - supporting environments where you can't get global randomness, especially WASM. In these environments there's probably nothing we can do better around global context, we have to hope people use their own local contexts and randomize them manually, if they want. A "better rand" here would still be one that just doesn't get any randomness.

apoelstra commented 2 years ago

Gotcha. So, in light of #388 (not sure if you've been following this) where we are moving toward removing global-context-less-secure and just using rand whenever it was available, I guess we can't do that, because rand sometimes unpreventably terminates the program from inside library functions.

So I guess we should keep global-context-less-secure, indicate in the docs that this is needed if you expect rand to misbehave, and we should gate the global context randomization on #[cfg(all(feature = "rand", not(feature = "global-context-less-secure"))].

TheBlueMatt commented 2 years ago

where we are moving toward removing global-context-less-secure and just using rand whenever it was available, I guess we can't do that, because rand sometimes unpreventably terminates the program from inside library functions.

I don't see why this is a problem? We just need to make sure we do not enable the rand feature/dependency unless the user does.

JeremyRubin commented 2 years ago

you can also test rand via https://doc.rust-lang.org/std/panic/fn.catch_unwind.html at runtime btw.

TheBlueMatt commented 2 years ago

That is std-only, sadly (not to mention very poor practice, IMO)

apoelstra commented 2 years ago

@TheBlueMatt we can't prevent parallel dependencies from enabling the rand feature ... I don't think we can reasonably say "this may break unless feature X is off"

tcharding commented 2 years ago

I don't think we need #385. Since we feature gate on "rand-std", and not on "rand", we do fully control the feature gating. If we were to use "rand" we would be at the mercy of other dependencies enabling "rand" as you say @apoelstra.

no-std users will not be enabling "rand-std" so they will not get the global context randomization and will not hit the rand panic issue, right?

apoelstra commented 2 years ago

Yeah, I think you're right. Ok.

Kixunil commented 2 years ago

Skipping over the details, would it be reasonable to provide a method for consumers to dynamically set a global randomization Fn? People could set it when initializing and not lose any security.

apoelstra commented 2 years ago

That sounds reasonable to me. @elichai what do you think?

elichai commented 2 years ago

That sounds reasonable to me. @elichai what do you think?

It can work, but I doubt most rust users will understand how to use this correctly. I do think that the best solution is to keep using getrandom and that if a users uses wasm32-unknown-unkown they would have to depend on getrandom by themselves and enable wasm-bindgen / js-sys depending on what they use.

And if they use a weirder environment then it should probably be supported in getrandom.

But, if it is common to use various exotic environments and they can't all be supported in getrandom then we could do something like that. but this sounds a bit weird to me.

FWIW there was a discussion on providing a getrandom-like interface in libstd: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/OS.20secure.20randomness https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/possible.20getrandom.20interface

TheBlueMatt commented 2 years ago

I'm not really sure what the use-case of such a method is - if you don't have access to getrandom, you should use global-context without the rand crate and randomize yourself or live without randomization. If you do have access to getrandom, the rand crate should work fine for you.

Kixunil commented 2 years ago

I doubt most rust users will understand how to use this correctly

I doubt most users will not. :) I imagine this doc like this

The function is provided for platforms that don't have inherent access to randomness but require some additional bridging. Wasm is a popular example of such platform. It doesn't provide API to get randomness but in browsers you can still get randomness using JS bindings.

Warning: do not implement custom randomness function, only bridge with existing randomness APIs on your platform. Make sure that if the API fails the supplied function fails as well (no fallbacks)!

People knowledgeable enough to use exotic platform should be knowledgeable enough to write bindings.

@TheBlueMatt

use global-context without the rand crate and randomize yourself

This may be too annoying to work with. While I in general don't like global mutable state, this looks like a very reasonable exception.

Kixunil commented 2 years ago

Alternative idea: depend on some CSRNG (e.g. from rand crate) and only accept seed. Less bothering with creating trait objects or so. May tempt people to supply all zeroes.

TheBlueMatt commented 2 years ago

This may be too annoying to work with. While I in general don't like global mutable state, this looks like a very reasonable exception.

I don't really understand the push-back here: afaiu, you really shouldn't (ideally) be randomizing only once and then never re-randomizing again. If you're signing with the same key over and over and over again, afaiu, re-randomizing is something you should strive for, and our API should encourage that. I think we're way over-engineering this for the never-rereandomize case.

Kixunil commented 2 years ago

My understanding of your previous comment is that you suggest consumers write something like this:

let sig1 = key.sign(ctx, msg1);
ctx.rerandomize(rng);
let sig2 = key.sign(ctx, msg2);
ctx.rerandomize(rng);
let sig3 = key.sign(ctx, msg3);
ctx.rerandomize(rng);

While my suggestion is to globally set a custom RNG (presumably using JS bindings) so that sign method automatically retrieves entropy from it and re-randomizes, so the code would become:

set_rng(my_rng);
let sig1 = key.sign(ctx, msg1);
let sig2 = key.sign(ctx, msg2);
let sig3 = key.sign(ctx, msg3);

Which is shorter, cleaner and thus I believe better achieves your goal of encouraging re-randomization. (I agree with the goal BTW.)

TheBlueMatt commented 2 years ago

I don't believe we can re-randomize on sign unless we make the context mutable, at which point the whole discussion on a global context isn't really relevant. If we're talking about a mutable-context-sign-method there's all kinds of things we can do - we can also have a parallel sign_with_random_bytes that takes randomness to re-seed with, but the mutable context generally makes the API much harder to work with.

Kixunil commented 2 years ago

We could make it thread-local, internally using RefCell, or just make it non-Sync and use interior mutability.

TheBlueMatt commented 2 years ago

AFAIU, thread _local is a problem on many of the platform rand is a problem on, and making them non-Sync (Send shouldn't be required?) makes the API harder to use (though maybe non-Sync isn't so bad).

real-or-random commented 2 years ago

I don't really understand the push-back here: afaiu, you really shouldn't (ideally) be randomizing only once and then never re-randomizing again.

As far as I can speak for upstream, this is something we're currently unsure about. I think the initial idea was that users often call _context_randomize but if you start to think about it, it's pretty unclear what the benefits are. Also, even Core randomizes only once. So I think this deserves some discussion upstream. I had a conversation with @jonasnick about this today, and we should be able to open an issue upstream with some initial thoughts. To give you an idea why this is complicated:

If you're signing with the same key over and over and over again, afaiu, re-randomizing is something you should strive for,

I believe the better protection here is to use synthetic nonces k, i.e., basically k = hash(secret_key || message || randomness) (as done in BIP340). This is strictly better than deterministic nonces even you ignore sidechannels, and it gives you some pretty good protection against side-channel leakage. The problem with signing with the same key (and message) over and over is mostly the EC multiplication k*G with a fixed k. This is the only thing protected by the randomization in the context. But if you use synthetic nonces, the k is anyway different every time, and this makes it much much harder for the attacker.

BUT you can't use this trick when it comes to pubkey generation (computing the pubkey from the seckey). So maybe it would makes sense to randomize before every pubkey generation instead of after every signing.

If we're talking about a mutable-context-sign-method there's all kinds of things we can do

This is another thing that I think should be solved upstream, though I admit there's no clear timeline here. We want to change the context API, and we have a wishlist for doing so (https://github.com/bitcoin-core/secp256k1/issues/780), which includes having mutable contexts that can be rerandomized automatically when useful. I don't think that it'll be a good idea to spend time on this in the bindings now. These are improvements where everyone could benefit, not only the Rust users.

TL;DR: I suggest to reconsider this after upstream made up their minds.

Kixunil commented 2 years ago

@TheBlueMatt hmm, looks like those platforms are screwed already unless they turn off re-randomization. I take back my suggestion to make context non-Sync (yep, that's what I meant) - consumers can already do it themselves by using RefCell and forcing everyone to use it just hurts modularity. If RefCell becomes popular, we could just write additional wrapper. (And maybe we can use UnsafeCell instead?)

TheBlueMatt commented 2 years ago

For context, on the LDK end, we re-randomize some, though maybe less than we should, but we do it via our own internal randomness interfaces, which are not global, and provided via a trait from the user. We do not assume any global randomness API, nor do we really want to add that assumption. FWIW, randomizing isn't all that hard.

apoelstra commented 1 year ago

I recently learned that getrandom has a js feature that can be enabled if we know that we're compiling for WASM in a browser context where it has access to strong randomness. I believe that if we directly depend on getrandom rather than going through rand, and expose the js feature through this library, that might make everyone happy.

There remains the question of what to do when the target arch is wasm, getrandom is enabled but js is not enabled. I am tempted to issue a compilation error in this case.

JeremyRubin commented 1 year ago

what if we have rand stay an optional dep, and have js also an optional dep on getrandom and allow users to pick one or the other?

+1 on compiler error as mentioned!

apoelstra commented 1 year ago

I think we need rand to stay on as an optional dep, yeah, because we expose its traits in other parts of the API.

So what I'm proposing is:

JeremyRubin commented 1 year ago

concept ack.

one note though -- we should be sure to not assume that all WASM contexts can have entropy through the getrandom binding, as oftentimes wasm won't have entropy at all.

apoelstra commented 1 year ago

This assumption is exactly what the js feature does.

But anyway it doesn't matter for the purpose of context rerandomization.

TheBlueMatt commented 1 year ago

I recently learned that getrandom has a js feature that can be enabled if we know that we're compiling for WASM in a browser context where it has access to strong randomness

What @JeremyRubin said - this just assumes that the JS side exposes some specific (very, very, very poorly documented) function to fetch randomness. Thus far I haven't even managed to find a reasonable implementation of the JS side of these things to crib from. They basically just assume you're using wasm-pack.

Kixunil commented 1 year ago

There's still a problem that if we want contexts to cease to be arguments and have them randomized the only way to do that is using thread locals. And wasm doesn't have thread locals.

But it just occurred to me that we could have a special API:

/// Opaque storage
pub struct ThreadLocal {
    ctx: Secp256k1<Sign>,
};

impl ThreadLocal {
    /// Initializes the thread-local context.
    ///
    /// Should only be used by the callers of `set_thread_local_provider.
    pub fn new() -> Self {
        // ...
    }
}

pub fn set_thread_local_provider(provider: fn() -> &'static mut ThreadLocal);

Applications that are guaranteed to be single-threaded can just store it in static UnsafeCell<ThreadLocal> and return mutable reference.

It needs some Send/Sync polishing but I don't have time to think about it RN.

TheBlueMatt commented 1 year ago

And wasm doesn't have thread locals.

Wasm also doesn't have threads (so, basically, any global is basically a thread-local). Maybe I misunderstood your statement but I don't get why its an issue? There is a proposal to add threads, but presumably if that happens thread-locals will then be supported.

Kixunil commented 1 year ago

There is a proposal to add threads, but presumably if that happens thread-locals will then be supported.

Exactly and if we just use static assuming there is only one thread then the moment they are added the library becomes unsound because we can't predict which exact version it will be in to conditionally switch it off. Better leave this responsibility to users.

apoelstra commented 1 year ago

I think we should just use static_mutex (or maybe some NIH'd equivalent). I think that anything mutex-related should be a literal no-op in WASM for now, and when WASM supports threads it'll just naturally do the right thing.

For non-wasm targets we'll need to use some sort of static mutex anyway.

What I'd really like is if the re-randomization stuff (which is very small and fast to update) could be separate from the pre-comp tables. Then we could unlock the mutex, copy it to a local, drop the mutex (extremely fast) and then do the rest of our operation without needing any locking at all. But that will require cooperation with upstream and it might be too ugly an API for them.

JeremyRubin commented 1 year ago

Question: are Contexts fork-safe?

e.g., would the following kind of thing pose issues?

let secp = Secp256k1::new();
let sk = ...;
let pid = fork();
let sk2 = new_secret_key_fn(&secp);
if  pid == 0 {
    sign(&secp, sk, "Example A");
} else {
    sign(&secp, sk, "Example B");
}

Despite not being threaded, I think the primitives for forking in WASM are likely to show up (with a shared memory thing). One thing that might be good to do is to ensure that it's safe to directly copy a context, or figure out how to make re-randomizing live in the host entirely (e.g., we could make a secp context that lives in the host's memory and is only available per-instance, such as follows:

struct SpecialContext<T>;

impl<T> Deref for SpecialContext<T> {
  type Target = Secp256k1<T>;
  fn deref(&self) -> &Secp256k1<T> {
    binding_to_function_that_asks_host_for_rerandomization_or_our_copy()
  }
}

)

Kixunil commented 1 year ago

For non-wasm targets we'll need to use some sort of static mutex anyway.

Not really, thread local would be more performant without any obvious issue. But you have a point: we could default to mutex unless one activates the feature and the feature is documented to not work on wasm where atomic is used. I'd even consider not blocking in wasm: if the mutex is locked just panic.

I feel like there could be even better trick but not sure.

Also note we can't make a true mutex in no_std - there are no parking primitives.

@JeremyRubin if you're using fork() this way you probably have different problems. But yes, they are safe, just lose protection. We could register at_fork callback to re-randomize them though.

I'd be surprised if fork got into wasm. They specifically don't want dubious features like that one.

JeremyRubin commented 1 year ago

Fork like behavior already exists and is a standard thing a user might do.

https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/Memory

So it's not something in spec, just something a user can do.

On Wed, Sep 21, 2022, 12:47 PM Martin Habovštiak @.***> wrote:

For non-wasm targets we'll need to use some sort of static mutex anyway.

Not really, thread local would be more performant without any obvious issue. But you have a point: we could default to mutex unless one activates the feature and the feature is documented to not work on wasm where atomic is used. I'd even consider not blocking in wasm: if the mutex is locked just panic.

I feel like there could be even better trick but not sure.

Also note we can't make a true mutex in no_std - there are no parking primitives.

@JeremyRubin https://github.com/JeremyRubin if you're using fork() this way you probably have different problems. But yes, they are safe, just lose protection. We could register at_fork callback to re-randomize them though.

I'd be surprised if fork got into wasm. They specifically don't want dubious features like that one.

— Reply to this email directly, view it on GitHub https://github.com/rust-bitcoin/rust-secp256k1/issues/346#issuecomment-1254153278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGYN67PZXWSS2JZJPPWANDV7NQ45ANCNFSM5J3I55HQ . You are receiving this because you were mentioned.Message ID: @.***>

TheBlueMatt commented 1 year ago

What I'd really like is if the re-randomization stuff (which is very small and fast to update) could be separate from the pre-comp tables. Then we could unlock the mutex, copy it to a local, drop the mutex (extremely fast)

Ehh, the speed of taking a mutex is somewhat platform-dependent, and the second there's any contention is suddenly becomes quite slow. I don't think a mutex on the contexts is really acceptable.

apoelstra commented 1 year ago

Fair enough -- we could make it a thread local then, but then we're back to the "maaybe wasm one day starts supporting threads without warning us" situation. Which honestly, I don't think is that big a deal -- it'll be a long time before browsers support it and a long time before people try using it, and we'll have warning.

TheBlueMatt commented 1 year ago

It is going to make upgrades very painful, though - we'll have some users using threads who want support and some users who want to support old browsers who can't have thread locals...I think for WASM we'll want to hand roll some spinlock with an atomic that will compile out to almost-nothing today.

Kixunil commented 1 year ago

Please, no spin locks. If we really have to lock something we should at least provide some interface to park the thread. Yes, I'm aware we have a spinning lazy initialization for no_std and that is also something I'd like to address eventually. I just didn't have the time for it yet.

TheBlueMatt commented 1 year ago

My point was that its not, actually, today, a spinlock. As long as you don't attempt to use threading in WASM it will remain not-a-spinlock, so that's fine. If you try to upgrade to threads in WASM it will become a spinlock, and we can at that point add a feature to enable real locks using std in wasm.

Implementing a trivial spinlock here would let us bridge the gap across the upgrade cycle without spending a lot of effort on it today, ensuring the API is safe whether users enable a std-locks feature or forget to.

Kixunil commented 1 year ago

Oh, yes, what you describe works for wasm (although I'd consider panicking instead of spinning) but the same problem occurs on embedded without OS (HW wallets) and we can't use that trick there so the interface to set some thread local provider is still required and once it's there we may as well enable it for wasm.

TheBlueMatt commented 1 year ago

Are there any embedded no-std platforms we care about that are multi-core? That would be very surprising to me. You could in theory still get screwed if there's a non-coop-multitasking system, sure, but I'd imagine that's also very very rare.

Kixunil commented 1 year ago

Firstly, one could wish to write a Rust library with C API linked to other languages and AFAIK it is (usually?) no_std to not require whole Rust std to be linked. Maybe you've heard of such project. ;)

Secondly, even if there are no no_std multicore platforms I don't think we can reliably detect them and refuse to compile. AFAICT having an API to provide appropriate handlers is the most practical approach and as a side effect it also supports all niche platforms without us having to do extra work.

TheBlueMatt commented 1 year ago

Well the initial discussion here was around WASM, for embedded platforms I think the same spinlock solution also trivially provides safety guarantees even if there were threading while having almost zero overhead in 99% of cases. For users wishing to build without pthread linkage I'm not sure what we can do aside from spinlocking - we have to provide safety, and panicking if we're running threaded seems kinda dumb - the user asked us to work without threading, why are we panicking? If we're really worried about users accidentally hitting a performance issue due to contention, the vast majority of platforms can be trivially detected and compile-error'd if std isn't enabled (with an override, I suppose). eg Linux/Windows/MacOS can be detected and we can error if neither really-no-std nor std is selected.

Kixunil commented 1 year ago

Well the initial discussion here was around WASM

Indeed and I point out that whatever solution works for embedded will likely work well for wasm as well. :)

Embedded platforms also have interrupts and if someone accidentally calls a secp256k1 function from an interrupt spinlock means a guaranteed deadlock if it's hit.

the user asked us to work without threading, why are we panicking

Because the user said the platform is single threaded but it's not.

detected and compile-error'd if std isn't enabled

I believe this would cause FPs in case of libraries but it's OK if the compile error is "missing symbols for synchronization functions".

error if neither really-no-std nor std is selected

As I mentioned in the issue at rust-bitcoin these almost-exclusive features are PITA, I'd rather not have them if possible.

If requiring embedded/WASM users to provide these synchronization primitives is too much boilerplate then maybe have cfgs (not feature - libraries shouldn't mess with it) secp256k1-fallback-spinlocks and secp256k1-assert-single-thread to set basic implementations provided by us but it'd still be useful to allow custom synchronization implementations.

Kixunil commented 1 year ago

FYI, thread_local! seems to work fine in WASM but it does require std which is not great. I remember there being some issue around it but it seems to be just something wrong with the rand crate. So maybe we could have it for those who don't mind std being linked to their modules.

TheBlueMatt commented 1 year ago

Embedded platforms also have interrupts and if someone accidentally calls a secp256k1 function from an interrupt spinlock means a guaranteed deadlock if it's hit.

Only if another function was running at the time. But, sure, yes, we cannot solve that unless we ~(a) panic, (b)~ make the user provide contexts. This isn't a spinlock issue but rather a ~lock issue~ multiple-access issue.

Because the user said the platform is single threaded but it's not.

No, the user said the platform didn't have pthread. They didn't say it was single-threaded. This is just true for WASM as it is for embedded-with-interrupts. We could have a different feature for "No threads, I promise", I suppose, but that's really not practical if secp is a transitive dependency.

I believe this would cause FPs in case of libraries but it's OK if the compile error is "missing symbols for synchronization functions".

Indeed, maybe that's not practical at all.