Issues with migration from 2.2.0 to 2.5.0

christopherhex commented 5 years ago

I'm trying to upgrade our SoftHSM installation from 2.2.0 - 2016-12-05 to the latest 2.5.0 - 2018-09-24

We use OpenSSL as a crypto backend and a file storage.

We have an existing token folder that was created and populated using version 2.2.0. However when we upgrade the binary to 2.5.0, we see very strange things:

Access becomes very slow
It seems that public keys get converted to another storage format because they disappear if we revert again to a 2.2.0 binary.

Should it be possible to use a token folder created in 2.2.0 with the latest version?

rijswijk commented 5 years ago

As far as I know, there have been no changes to the backend storage between version 2.2.0 and 2.5.0, so data on disk should be compatible. One thing that comes to mind, though, is that a known limitation is that if you move from a 32-bit build to a 64-bit build, the stored size of CK_ULONG changes, making tokens stored on one architecture incompatible with another architecture. You may want to check if this is the case.

christopherhex commented 5 years ago

Hi @rijswijk, thanks for the quick response.

For the two builds (2.2.0 and 2.5.0) I get the same output when doing file libsofthsm2.so:

libsofthsm2.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, with debug_info, not stripped

So it looks like both versions were already 64-builds.

The strange thing is that it only seems to be impacting public keys...

rijswijk commented 5 years ago

Are you building on the same platform for both versions as well? What type of public key are we talking about, RSA, EC? The only thing that changed is that we added support for newer EC algorithms AFAIK, but that shouldn't impact backing storage.

christopherhex commented 5 years ago

We use EC public keys and this is the javascript code (using pkcs11js module) we use to add them to the token:

  const res = HSM.pkcs11.C_CreateObject(HSM.session, [
    { type: pkcs11js.CKA_CLASS, value: pkcs11js.CKO_PUBLIC_KEY },
    { type: pkcs11js.CKA_KEY_TYPE, value: pkcs11js.CKK_EC },
    { type: pkcs11js.CKA_TOKEN, value: true },
    { type: pkcs11js.CKA_MODIFIABLE, value: true },
    { type: pkcs11js.CKA_EC_POINT, value: publicKey },
    { type: pkcs11js.CKA_LABEL, value: `PIN_${pinId}_PUBLIC` },
    { type: pkcs11js.CKA_EC_PARAMS, value: Buffer.from('06092B2403030208010109', 'hex'); }
  ]);

rijswijk commented 5 years ago

I've checked the commit history, there have been no changes that should affect storage in such a way that there are incompatibilities between versions. From your original request I get the impression that anything written/created with 2.2.0 is readable with 2.5.0, so that would confirm this. Going back from a newer to an older version is not something we would actively support.

W.r.t. performance: you say 2.5.0 is slower than 2.2.0, but did you also change OpenSSL versions (upgrade to a newer version)? Because this may also make a difference in performance. Anyway, in case you want raw performance for certain operations (in this case ECC public key ops), you should probably be using something other than SoftHSM, because performance has never been a primary design goal, we generally prefer security over performance. Have you tried profiling the code to check where the performance hit is?

christopherhex commented 5 years ago

I've done some additional testing over the last couple of days and this is my conclusion:

From release 2.4.0 on the crypto-ops become very slow in case of a token with a significant amount of objects stored in it. Performing the same test in 2.3.0 gives a better performance and is not hit as hard as 2.4.0 when more objects are stored.

Test Specifics

All tests are done in docker containers, the only difference between versions is commit hash used to checkout SoftHSM version.
Procedure
- start test, create 1 key pair and add 1 public key,
- calculate 10 signatures on random data using added key pair
- encrypt 10 times some random data using derived key from key pair and public key
- repeat 2 steps above 5 times to have average.

Test Results

SoftHSM 2.4.0 contains only 2 keypairs:

signing 10 times: +- 220 ms
derive AES key + encrypt 10 times: +- 450ms

SoftHSM 2.4.0 contains +- 600 objects:

signing 10 times: +- 6500 ms
derive AES key + encrypt 10 times: +- 12000 ms

SoftHSM 2.3.0 contains only 2 key pairs:

signing 10 times: +- 35 ms
derive AES key + encrypt 10 times: +- 82 ms

SoftHSM 2.3.0 contains +- 600 objects:

signing 10 times: +- 35 ms
derive AES key + encrypt 10 times: +- 105 ms

Should we consider this as a bug in the SoftHSM implementation or is the SoftHSM implementation not designed to cope with a large amount of keys in store?

Thanks for the help!

rijswijk commented 5 years ago

I believe this may be due to a fix that was introduced in SoftHSM v2.4.0 for a race condition, #359, which adds additional locking. Since each object is stored in a separate file this may impact object search times. There may be other implementation changes you can make in your code to improve performance. If, for example, you always search for an object prior to each signing operation using C_FindObjects, then the find operation counts towards the runtime, which will increase with the number of objects stored in the token. You may want to check that first.

We would not consider this a bug, although the figures you quote do indicate a performance bottleneck. If you can check how you perform your test and see what happens if you leave out C_FindObjects (if that is repeatedly called for each test), that would be helpful.

stewilko commented 4 years ago

This is something that we've noticed too. Each C_FindObjects operation generates loads of disk IO as it locks and scans the generation values across all the key material files. Our use case sounds similar to the one above - we're using SoftHSM to store hundreds (potentially thousands) of objects. Since incoming requests are assumed to be uniformly distributed across the objects I don't want to keep thousands of handles open. Hence the frequent use of C_FindObjects, I'm not sure of another way of driving the PKCS11 interface to avoid those calls.

In our case it's write infrequently and read often. As an experiment I built SoftHSM with SQLite support on Windows (our initial target platform) and performance was worse. Perhaps the C_FindObjects logic could be moved closer to the db layer, with indexing maybe, and since SQLite is ACID compliant it may mitigate some of the need for generation tagging / locking. Each individual operation may become slower, but my gut feeling is that it would scale better as the number of stored objects increases.

rijswijk commented 4 years ago

Introducing some form of indexing is definitely something we can consider; we would likely abstract away from the backing store when doing that, though, because it would also be useful for the file-backed object store. I'm a bit surprised, though, that your OS's disk caching doesn't partly solve the I/O bottleneck for you.

That taken aside, there are performance optimisations you could make in your application to achieve the same goal, e.g. by keeping sessions open and caching the results of calls to C_FindObjects. There is no real issue in keeping handles open (why do you prefer not to do this?). I know from experience that there are many other PKCS #11 implementations from e.g. HSM vendors that have similar performance bottlenecks.

stewilko commented 4 years ago

It feels a little odd keeping handles around when they aren't being used as it's difficult to predict which object will be required next. I suspect a physical HSM may get upset keeping many objects open? I've already implemented some reference counting to keep sessions open, I'll extend that to include the handles inside the session too, probably with a lifetime / flushing strategy to limit the maximum open at any one time. I'll need to double check if handles become invalid - when creating a new token or new object within a token for example.

rijswijk commented 4 years ago

I cannot comment on physical HSMs, but the handle space in SoftHSM is tied to the system architecture (i.e. 32 vs. 64 bits), which gives you plenty of handles. Handles should not become invalid if new tokens or objects are created, so that is not going to be an issue. Of course, depending on how many objects you are handling, at some point memory is going to become an issue.

opendnssec / SoftHSMv2

Issues with migration from 2.2.0 to 2.5.0 #467

Test Specifics

Test Results