watfordjc / LTO-Encryption-Manager

LTO tape AES key management using SLIP-0021 symmetric key derivation from a BIP-0039 wallet mnemonic.
MIT License
3 stars 0 forks source link

Implement BIP-0085 #5

Open watfordjc opened 3 months ago

watfordjc commented 3 months ago

Feature Branch

Current feature branch for this issue: bip0085.

Progress


Background

The use of SLIP-0021 for key derivation was chosen because (a) SLIP-0021 is for symmetric keys, LTO drives use AES256-GCM, and AES is symmetric; (b) SLIP-0021 doesn't care how you get the secret to create node m, so I just went with BIP-0039; (c) it just uses HMAC-SHA2-512 to derive nodes.

Point (c) means that no elliptic curve cryptography has been involved in LTO-Encryption-Manager at this point- wrapping AES keys before sending them to an LTO drive involves encrypting the key to the drive's RSA-2048 public key.

However, I do want to implement BIP-0085 for deterministic entropy generation (e.g. using SHAKE256 for RSA keys), and that requires the implementation of BIP-0032 and EC crypto. I would also like to be able to deterministically generate crypto seeds/wallets/addresses for various reasons - bitcoin/litecoin/ethereum are probably what a thief would first be looking for, so transferring 0.1 LTC (currently about 5 GBP) to the first public address of m (Bitcoin seed) could act as a canary to warn that a BIP-0039 seed phrase may have been compromised.

I would also like to create a derivative of BIP-0085 that uses SLIP-0021 ASCII labels for nodes rather than BIP-0032 indexes. BIP-0085 tries to use the decimal equivalent of ASCII for its defined applications and even itself: everything BIP-0085 is under m/83696968H (a hardened node) with 83696968 being the decimal equivalent of ASCII SEED.

ASCII labels would make the requirement to note down what each index has been used for, so there could (for example) be a derivation path for generating TLS certificate keypairs where one of the node labels is the Common Name (CN) of the certificate. There is also the issue of key/entropy destruction that is not possible with increasing an index number that starts with zero and is an issue with SLIP-0021.

One of the advantages of using SLIP-0021 for LTO tape AES keys is that with the parameters needed to recreate the key stored in the cartridge memory (MAM) the only thing I need to keep a record of is the BIP-0039 seed phrase. One of the disadvantages is that there is no way to destroy data by deleting all copies of the keys because they can easily be re-derived.

Some things, like keys for TLS server certificates (Web sites, mail servers, etc.) might not need to be able to be destroyed because the biggest concern with them is that the private keys are somehow compromised. TLS connections negotiate a session key, and Perfect Forward Secrecy (PFS) means that if a server key is compromised past session keys should not be compromised.

Basically, I want the ability to deterministically create entropy/keys that can either (a) be destroyed, or (b) be easily recreated.

watfordjc commented 3 months ago

Initial Work and Thoughts

It hasn't been pushed to GitHub yet, but my local bip0032 branch has implemented methods for Base58 encode/decode, Base58Check encode/decode, and BIP-0032 derivation. The tests that cover the BIP-0032 test vectors don't yet all pass because some things to do with error checking aren't yet implemented (e.g. the Base58Check checksum isn't validated). Once the tests are passing the code will need refactoring.

My local bip0032 branch has only focussed on classes/methods/tests as I haven't decided if I will change the GUI or create a separate application.

The main branch hasn't yet been fast-forwarded for the dependency updates and the update to .NET 8 because I need to compile the code on my computer that has a tape drive and test everything still works.

Derivation Schemes

I plan to implement BIP-0085 but not to use it myself for non-cryptocurrency purposes due to BIP-0032's limitation on 2^31 hardened indexes per level. SLIP-0021 derivation is easier to implement and doesn't have a concept of 'normal' and 'hardened' nodes. In other words, I might use BIP-0085 to derive BIP-0039 seed phrases, but for other uses of entropy I'll use my own derivative of BIP-0085 that works with SLIP-0021.

Having thought about whether or not m should use a different BIP-0032 'curve' to SLIP-0021's "Symmetric key seed", I have decided the answer is no because of how SLIP-0021 defines first-level children.

As for the first level children, I am thinking of something like uk.johncook.slip-0021.entropy and uk.johncook.slip-0021.entropy-external, with entropy-external indicating an external source of entropy was also involved in derivation and to deterministically recreate the entropy/key a backup of that external entropy will be required.

Given the application already uses the TPM, my initial thoughts are that the external source of entropy will be 256 bits (32 bytes) long and could be stored in a way that makes destruction possible. Given my use of SSDs, storing that entropy on disk will not make destruction 100% possible. As we're talking about things like RSA-4096 and secp384r1 keypairs, these are things that tend to last a moderate amount of time until there is a key rollover, so using BIP-0039 to store the external entropy by writing down 24 words on index cards with pencil might be one option (there'll need to some sort of fingerprint too to reduce the risk of shredding/burning the wrong index card).

TLS Certificates

Let's start with TLS certificates. I am thinking this might suffice:

m/"uk.johncook.slip-0021.entropy"/"0"/"TLS"/"example.com"/"2024-08-19"/"0"

The first "0" is the globalKeyRollovers count (base-10 integer as a string) as required by my SLIP-0021 node fingerprinting schema, "example.com" is the Common Name (CN) that will be used for the certificate, "2024-08-19" is the ISO 8601 date (UTC+0 timezone) the key was generated, and the second "0" is the base-10 index of the key (i.e. allows more than one to be created on the same day).

Note: the date a key is generated is not necessarily the date it was first used. It might have been generated as a future rollover key, or even an emergency mail server key that has never been on the mail server - i.e. deterministically generated, turned into a zonefile TLSA record and zonefile comment giving the derivation path parameters that are both copied and pasted into the zone file through an SSH window, and the key then being destroyed.

Now we need to differentiate between RSA and EC keys, which we don't do in the derivation path because the path is algorithm agnostic. We also don't include the key length in the path. If you want an RSA-4096, RSA-3072, secp384r1, and secp256r1 key, just indicate the date and index 0, 1, 2, and 3 in however you're noting down your derivation path reminders (e.g. within the filenames of the PEM-/DER-encoded public keys).

Borrowing from BIP-0085, we use HMAC-SHA512(key="bip-entropy-from-k", msg=k) where k is the right half of the node (the symmetric key, as SLIP-0021 calls it). This differs from BIP-0085 as BIP-0085 uses the left side, but that's because BIP-0085 uses BIP-0032 derivation where each node is (key || chain code) whereas SLIP-0021 it is reversed and is (derivation key || symmetric key). As non-hardened derivation doesn't exist in SLIP-0021, the reason BIP-0085 performs this step doesn't exactly exist.

As it gives us 512 bits of deterministic entropy without touching the bytes in the node's derivation key, performing this step allows child nodes rather than requiring the creation of a "no child node" rule. It also means that an implementation of BIP-0085 can reuse the BIP-0085 method/function of turning a byte[32] k into entropy.

We have now derived 512 bits of entropy. For (elliptic curve) keys that don't need more than 512 bits, do the BIP-0085 thing and take as many bytes as needed starting from the leftmost/first/most-significant byte (the byte[] should be in network byte order), truncating the rest. So, for a secp384r1 key, that'd be entropy[..48], and for a secp256r1 key entropy[..32].

If more than 512 bits are needed, whether for ECC or RSA, we do the BIP-0085 thing and implement BIP85-DRNG. We use those 512 bits of entropy to seed a SHAKE256 stream, and consume as many bits as needed to generate a key. So for a 521 bits EC key you feed SHAKE2 with the 512 bit output from HMAC-SHA512(key="bip-entropy-from-k", msg=k) and ask it for 521 bits of output.

For RSA, the SHAKE256 stream's output is fed into the RSA key generator's input. I haven't looked into how that would be done in code.

DNSSEC Keys

For DNSSEC, we just need to change "TLS" to "DNSSEC" and add a level for "ZSK" or "KSK":

m/"uk.johncook.slip-0021.entropy"/"0"/"DNSSEC"/"example.com"/"ZSK"/"2024-08-19"/"0"

m/"uk.johncook.slip-0021.entropy"/"0"/"DNSSEC"/"example.com"/"KSK"/"2024-08-19"/"0"

The first "0" is the globalKeyRollovers count (base-10 integer as a string) as required by my SLIP-0021 node fingerprinting schema, "example.com" is the domain name the DNSSEC key is for, "ZSK" and "KSK" are to separate zone-signing keys and key-signing keys, "2024-08-19" is the ISO 8601 date (UTC+0 timezone) the key was generated, and the second "0" is the base-10 index of the key (i.e. allows more than one to be created on the same day).

Note: the date a key is generated is not necessarily the date it was first used. As with TLS, the key type is not included in the derivation path and will need noting.

If you have a policy for when to roll keys over (a Google snippet claims ZSKs could be every year and KSKs every 5 years), you could standardise on the dates. For example, you might go with the 1st January (no matter the actual date) and the current year for a ZSK, and the year modulo 5 for a KSK. So for keys created on 2024-08-19, a ZSK derivation path containing "2024-01-02" and a KSK derivation path containing "2020-01-01". Doing this would make keeping track of the key indexes that have been used more important to avoid node reuse.

PGP/GPG Keys

For GPG keys, I note the issue with BIP-0085 deciding to use the timestamp of the Bitcoin genesis block, which is likely because either (a) BIP-0032 has a 2^31 limit on hardened nodes so can't fit a Unix timestamp, or (b) because the derivation path is so hardware that doesn't know the current date can use the path.

For GPG keys, my derivation path will (much like for TLS and DNSSEC) include an ISO 8601 node label but this will not be restricted to just the date, although it is advisable to restrict it to the UTC+0 timezone so that the timezone offset can be reduced to Z. So for a key generated at 2024-08-19 11:22 UTC+0 the node's label would be `"2024-08-19T1122Z".

The reasoning for this is because GPG public keys tend to be publicly published (e.g. to a keyserver), so the date/time a key was created can be extracted from a public key lookup.

I haven't decided on a derivation path yet because I need to know more about GPG keys, sub-keys, fingerprints, etc. RFC 9580 has also recently reached RFC status.

watfordjc commented 2 months ago

Progress

I have somewhat implemented the first two bits of BIP85 on my local bip0085 branch: the bip-entropy-from-k HMAC-SHA512 and BIP85-DRNG-SHAKE256.

BIP85-DRNG-SHAKE256 still needs some refactoring as I haven't quite decided how best to wrap a Shake256/ShakeDigest instance.

RSA Implementation

Implementing deterministic RSA has been a bit of a challenge:

  1. I needed to understand the correct generation of all the RSA parameters.
  2. I needed to implement the Extended Euclidian Algorithm in order to calculate some of those parameters.
  3. I needed to know how to properly encode them for an instance of RSAParameters so that .NET was happy with creating an RSA key from it.
  4. I needed to switch to BouncyCastle.Crypto so that I could specify endianness as a parameter when creating BigIntegers and getting the bytes from them
  5. I needed to look deeply into the libraries the Python (reference) implementation of BIP85 uses in order to work out why my implementation was creating different PEM files.
  6. I needed to discover that versions of Windows (including mine) will not create an identical RSAParameters when exported from an RSA even if it is two lines later in the code. If you give it a d (private exponent) that has been calculated using lambda(n), it will export it as a d calculated using phi(n).
  7. I needed to refresh myself on DER/PEM/ASN.1, so I could create my own PEM files so they'd contain a d created using lambda(n).

Python Library Research

When it comes to the Python libraries the reference implementation uses, I noted the following:

  1. q (prime2 or prime1) is always greater than p (prime1 or prime2). My implementation does the opposite by default (p is always greater than q) because I needed them ordered as I was using two Task to look for primes on separate threads.
  2. p and q must be searched for sequentially. This is because Crypto.PublicKey.RSA takes an RNG as an optional parameter, which it passes to Crypto.Math.Integer for generating numbers, which it passes to Crypto.Math.Primality to use as the source of witness numbers for Miller-Rabin testing.
  3. Crypto.PublicKey.RSA/Crypto.Math.Integer forces the least significant bit and the most significant bit to be set to 1 for all numbers that come out of the RNG. Crypto.Math.Integer's random number function also treats all numbers that come out of the RNG as big endian.
  4. BouncyCastle's BigInteger.IsProbablePrime() method doesn't take an optional RNG as a parameter. In order for my DRNG to output the same bytes as the reference implementation for each p/q candidate, I would either have to write code that eats an estimated amount of bytes before/after each primality test, or implement Miller-Rabin myself. I went with the latter (still needs more thorough testing).
  5. Maths on big numbers is difficult. The .NET (8) implementation of BigInteger doesn't even have a square root function. Python has similar issues. There's some maths that looks good in an encryption algorithm standard or a Wikipedia article about it, but at a certain point compromises have to be made (and discovered and emulated).
  6. Crypto.Math.Primality uses a number of rounds of Miller-Rabin based on the length of the prime in bits. The number is then put through one round of Lucas testing. I need to implement the Lucas primality test.

Miller-Rabin Implementation Testing

I am working on testing my Miller-Rabin implementation. Working out how to create a primality test testing test unit I Googled for lots of primes and found a downloadable zip file with all the primes (excluding 2) that fit within 32 bits contained in a single binary file (4-byte aligned unsigned little endian integers).

Streaming a file within a zip file in .NET is possible, seeking without unzipping not so much. The file contains 203,280,220 prime numbers, each taking up 4 bytes, which would be 813,120,880 bytes (813 MB/775 MiB) uncompressed, which isn't the sort of thing to be included in a git repository or a unit testing package.

I pondered on how best to split up and shrink the amount of bytes needed to store 775 MiB of prime numbers, and decided to split them into groups of 10,000,000 primes that are each split up into 23 groups of varying sizes in a determined order (5K, 10K, 10K, 25K, 50K, 50K, 50K, 100K, 100K, 100K, 250K, 250K, 500K, 500K, 500K, 500K, 1M, 1M, 1M, 1M, 1M, 1M, 1M).

Every prime in the file was then read and appended (as big endian) to a SHA256 instance, with a hash created and the SHA256 instance reinitialised between each group of primes. The first and last prime of each group, plus the SHA2-256 hash of all the primes in that group (4 bytes each, big endian) were then written to the log file which I then saved.

As a result, I have lossily compressed the 775 MiB of data down to a 39,591 byte (39.6 KB/38.7 KiB) CSV file that consists of 83 character (including newline character) lines - 477 lines of hashes of groups of prime numbers, with the hex for the first prime and last prime in that group.

00000003,0000BDEB,39956B67EC72A3594A3487ED8786A57E32EC90BAB7DBC5CA8811F630025A9C8D
0000BDEF,00028007,ABFFFA7C95072031155BD1FCE3D40891FFBE9D2AB470FC3B73EABFF7CA024059
...
FEF7E639,FFA110F5,593AF12C3BFFCA4F50622035E185B873FD908F16D13CA98E707B3C93A0BAC91F
FFA11127,FFFFFFFB,3967897C10CBE09B4FEAB2BC172D0E424BDF245FBF64F20602F4A1B5FFD9CD9C

The reason for the 23 different groups was because primes get further apart the higher the numbers go, so I wanted something that was determinate and repeating that also added up to a nice round number. So, if you wanted to start searching for a group of 5,000 primes you can start at byte 0, or at byte...

203,280,220 rounded down to the nearest 10,000,000 = 200,000,000
200,000,000 / 10,000,000 * 23 = 20 * 23 = line 460
460 * 83 = byte offset 38,180 (or 38,181 if the first byte is considered byte 1)
tail -c+38181 32bit-prime-number-group-hashes.csv | head -c83
FBAA30BB,FBABE49F,1A797613B64A8B488F1559957785EE462E793DE6E0B0E205591FA6EE4EE510F1

The SHA2-256 hash for all 5,000 primes (represented as 4-byte big endian) in the inclusive range of 0xFBAA30BB (prime number 4,222,234,811) and 0xFBABE49F (prime number 4,222,346,399) inclusive is 1A797613B64A8B488F1559957785EE462E793DE6E0B0E205591FA6EE4EE510F1. That's a little under 56,000 odd numbers to check if you're not going to do any filtering.

watfordjc commented 2 months ago

Progress Notes

This comment contains additional notes regarding progress.

Standard RSA Implementation

My implementation of BIP85's method of generating deterministic RSA keys is almost there, the code just needs to be reorganised.

The main problem I have had is that I have to write a lot of code to do what the reference implementation does in one line, because of the difference in language and dependencies.

The bip85 reference implementation's first RSA test vector:

bip85 = BIP85()
test = bip85.bip39_mnemonic_to_entropy("m/83696968'/0'/0'", MNEMONIC)
drng_reader = BIP85DRNG.new(test)
rsa = RSA.generate(bits=2048, randfunc=drng_reader.read, e=65537)
key = rsa.export_key(format='PEM', pkcs=1)
key_hash = hashlib.sha256(key).hexdigest()
expected = '64ff572798a6534c76eda9fd2d7e906a737a1bad893dec31ae3d0488e3f19ed9'
assert key_hash == expected

A brief mention that in order to get expected by exporting a key as done in the key = rsa.export_key(format='PEM', pkcs=1) line I had to write a function to create a DER/PEM for an ASN.1 sequence because my version of Windows exports RSA to RSAParameters or PEM using phi(n) to calculate d even if the RSAParameters used to create the RSA used lambda(n) for d.

I also wrote the code to do what test = bip85.bip39_mnemonic_to_entropy("m/83696968'/0'/0'", MNEMONIC) relies on a library for when I implemented BIP32.

drng_reader = BIP85DRNG.new(test) just has two checks and then does return SHAKE256.new(entropy). While I could have done something similar, in order for me to emulate PyCryptodome I had to take an RNG as a function input, so I needed to create a wrapper class around a Shake256/ShakeDigest that inherits RandomNumberGenerator. I also wrote a Stream wrapper class for wrapping that class.

The problem line: rsa = RSA.generate(bits=2048, randfunc=drng_reader.read, e=65537). To be completely compatible in C# with that single line, things have to be done the way the underlying library does them. In this case, PyCryptodome.

Miller-Rabin Implementation

My MillerRabin class is based on several sources of information.

First, Google Gemini explained Miller-Rabin to me in 4 steps. Code comments are also based on my understanding of how the algorithm functions based on Google Gemini's explanations.

Next, the Wikipedia article gave several pseudocode options to choose from:

  1. Miller–Rabin test
  2. Miller test
  3. Miller–Rabin test with GCD calculations

I started with option 1 and shifted to option 3.

The Wikipedia article also mentioned there are known "small groups" of bases for which all numbers within certain ranges will give deterministic results by just performing the Miller-Rabin test against those bases.

I have left a TODO comment in the code to implement that.

The third source of information was BouncyCastle and PyCryptodome's source code.

In order to implement BIP85 in a compatible way with the Python reference implementation, Miller-Rabin needed to function in the same way as the Python implementation does. That meant doing things the way its dependencies (i.e. PyCryptodome) does them when it comes to the order of operations and the use of the DRNG.

PyCryptodome uses the DRNG for witness numbers (since a bugfix commit made it to v3.10.0). I couldn't see an overload for supplying an RNG in the version of BouncyCastle I was using, and I also wanted to try to remove the requirement to convert the number from a byte array, to a .NET BigInteger, back to byte array after setting bits, and then to a BouncyCastle BigInteger to call IsProbablePrime().

PyCryptodome's round numbers give less than E-30 error rate, which is approximately 2^(-96). FIPS 186-4 (which PyCryptodome references) has three tables of suggested rounds, with table C.3 giving 2^(-100). For widely used RSA key sizes the number of rounds are equivalent. To ensure compatibility with the reference implementation, a compatibility flag was created to use the same number of rounds as PyCryptodome.

PyCryptodome does trial division against the first 100 primes before calling a callback to check if primality testing should be conducted on a candidate number. My initial small divisor test only went up to integer 13, so the first test vector in the reference implementation always failed because my implementation was consuming more numbers from the DRNG for witness numbers - I was using a prime as a candidate for RSA 'q', PyCryptodome was using it as a witness number.

As a result of wanting to be compatible with the reference implementation, and .NET 8 (and BouncyCastle) not offering a way to be compatible with how PyCryptodome does things, what the reference implementation does in 1 line simply was not possible. I had to implement prime number generation, as well as needing to implement extensions to .NET BigInteger for things like square root, perfect square, modular multiplicative inverse, Miller-Rabin testing, and Lucas testing.

I implemented the Newton-Rasphor method of approximating a square root for a BigInteger in my ApproximateSquareRoot method because PyCryptodome uses the FIPS 186-4 C.5 method for detecting a perfect square, but when I implemented it I ran into an infinite loop issue because BigInteger does not have a decimal point. A spreadsheet suggested I could get within 1.75 of the formula by using BigInteger.DivRem, but if I was going to be testing potential roots until overshooting I decided to find a more precise square root method and implement IsPerfectSquare my own way.

My Miller-Rabin implementation has not been well-tested - so far only the odd numbers up to the 1,000,001st prime number have been tested for primality once using the PyCryptodome number of rounds (zero false positives or negatives).

Due to the slow progress of testing so many numbers, I introduced a counter in the test so that numbers would be skipped if they are divisible by a small prime, so all numbers above 29 up to the 10,000,001st prime number have also been tested for primality against this implementation excluding those divisible by 2/3/5/7/11/13/17/19/23/29 (zero false positives or negatives).

Testing for Fermat and Mersenne numbers, which PyCryptodome doesn't do, seemed like a simple addition. Those tests can be enabled with flags.

watfordjc commented 2 months ago

BIP85 RSA Implemented

Commit c3470e359a3a25401fb5740d53125259c8c6a05a implements BIP85 RSA.

The BIP85/BIP32 node needs to be fully derived before being used as a parameter to the functions. The key length is extracted from the derivation path of the node.

watfordjc commented 2 months ago

BIP85 Partially Implemented

The following have been implemented:

Not Yet Implemented

The following have not yet been implemented in the bip0085 branch:

I have not implemented HD-Seed WIF (nor do I believe I've implemented WIF) anywhere else in the application, so implementing it would require research.

I have not implemented PWD BASE85 because the Radix-85 alphabet/encoding implemented in the application is Z85. As PWD BASE85 uses the Ascii85/Base85 alphabet, I have not yet decided whether to add a second Radix-85 alphabet/encoding.

I have not implemented RSA GPG because the application does not currently implement GPG anywhere.