microsoft / SymCrypt

Cryptographic library
MIT License
660 stars 68 forks source link

Add alloc free override functions for Windows #38

Open mjsabby opened 4 months ago

mjsabby commented 4 months ago

This will allow users of the library to pool allocations

NielsFerguson commented 4 months ago

Thanks for looking at the code and thinking about how to improve it.

I don't see how this would work as there are often multiple users of the SymCrypt module in the same address space. If one caller overrides the memory allocations to get a benefit, then it will very often lead to problems when other callers in the same address space also use SymCrypt and their allocations also get re-routed. I've seen a library that had the allocation parameters on Thread-Local-Storage, but that fails if you have worker threads where the same thread does work for multiple high-level modules in the address space, and the low-level code that actually calls SymCrypt doesn't know about any of that.

SymCrypt already works hard to minimize the allocations needed. We only use allocations for asymmetric operations, which are relatively expensive already making the overhead small. We also limit ourselves to one allocation per operation; an RSA signature will do a single allocation for the whole signature operation. I don't see any significant benefit possible with making the allocations configurable.

Note that the SymCrypt static-link library does use callback functions allowing the caller to implement the memory allocation.

mjsabby commented 4 months ago

@NielsFerguson On my i9 machine 10% of the time is spent is allocating the 18kb scratch space that is needed. So this is reasonably large improvement. It goes from 10.5 micro seconds to about 9.1 microseconds with no heap fragmentation occuring.

I'd also like to propose we make SymCrypt faster for validating signatures that are found commonly on the internet, namely SHA256 signatures seen in things like AAD tokens.

For example, take a look at this code I put out there that does stack allocation of the 18kb scratch space, does not need allocate memory to fix up the padding and is able to do it given we know the 224 bytes out of the 256.

https://gist.github.com/mjsabby/b87b629fa902cd641b955892b1ab9604#file-verifyrs256jwtsignature-cs-L179

What are your thoughts?

I'm trying to make SymCrypt stack allocation only and not have to pay for PKCS1 padding removal/addition.

I only care about token validation.