tillitis / tillitis-key1

Board designs, FPGA verilog, firmware for TKey, the flexible and open USB security key 🔑
https://www.tillitis.se
382 stars 24 forks source link

xorwow based initial RAM randomization #201

Closed secworks closed 3 weeks ago

secworks commented 3 months ago

This PR improve initial RAM randomization by adding the xorwow variant of Marasaglia xorshift PRNG as data generator.

Closes #162

dehanj commented 3 months ago

Fixed format, CI fails on hashes - which it should at this point.

dehanj commented 3 months ago

This implementation yields:

 ➜  ~ ent ram_out_xorwow.bin
Entropy = 7.998632 bits per byte.

Optimum compression would reduce the size
of this 131072 byte file by 0 percent.

Chi square distribution for 131072 samples is 248.17, and randomly
would exceed this value 60.85 percent of the times.

Arithmetic mean value of data bytes is 127.2708 (127.5 = random).
Monte Carlo value for Pi is 3.135179675 (error 0.20 percent).
Serial correlation coefficient is 0.000223 (totally uncorrelated = 0.0).

Startup time is not noticebly slower than original. Needs to be measured somehow.

ent on our current implementation gives:

Entropy = 7.954691 bits per byte.

Optimum compression would reduce the size
of this 131072 byte file by 0 percent.

Chi square distribution for 131072 samples is 8146.58, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bytes is 127.5535 (127.5 = random).
Monte Carlo value for Pi is 3.132249943 (error 0.30 percent).
Serial correlation coefficient is 0.009923 (totally uncorrelated = 0.0).

NOTE: that the app is still present in this .bin-files and are obviously not random. The first 0x10C bytes should be cropped away.

dehanj commented 3 months ago

Updated where I cropped the first 272 bytes (which should be the app).

➜  ~ ent ram_out_xorwow_crop.bin
Entropy = 7.998641 bits per byte.

Optimum compression would reduce the size
of this 130800 byte file by 0 percent.

Chi square distribution for 130800 samples is 246.23, and randomly
would exceed this value 64.18 percent of the times.

Arithmetic mean value of data bytes is 127.3576 (127.5 = random).
Monte Carlo value for Pi is 3.133211009 (error 0.27 percent).
Serial correlation coefficient is -0.000621 (totally uncorrelated = 0.0).
➜  ~ ent ram_out_orig_crop.bin
Entropy = 7.954721 bits per byte.

Optimum compression would reduce the size
of this 130800 byte file by 0 percent.

Chi square distribution for 130800 samples is 8124.03, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bytes is 127.6415 (127.5 = random).
Monte Carlo value for Pi is 3.135779817 (error 0.19 percent).
Serial correlation coefficient is 0.009079 (totally uncorrelated = 0.0).

Marginally better than with the app.

dehanj commented 2 months ago

I propose we move forward on this and try to get it merged.

But what tests do we need to perform in order to be confident enough about the implementation? We have some initial results above, using ent where we could see an improvment in shi square distribution. However, maybe we should:

What do you think?

secworks commented 1 month ago

I've done som more tests to compare the C implementation with the Go model

First 10 values from C implementation:

addr: 0x0ab510c6, data: 0x10052812
addr: 0x00502e9f, data: 0x15e160a2
addr: 0x093237ad, data: 0x0752b749
addr: 0x0a75c99d, data: 0x66cc4d05
addr: 0x0ec0680d, data: 0x02ffccc4
addr: 0x0f36b45c, data: 0x97811f88
addr: 0x014c48d1, data: 0x3a78f3a1
addr: 0x05d5b3c9, data: 0xa594f538
addr: 0x096aa20e, data: 0x5d4efc72
addr: 0x0b86b00a, data: 0xdadf5f43

First 10 values from the Go model:

addr: 0x0ab510c6, data: 0x10052812
addr: 0x00502e9f, data: 0x15e160a2
addr: 0x093237ad, data: 0x0752b749
addr: 0x0a75c99d, data: 0x66cc4d05
addr: 0x0ec0680d, data: 0x02ffccc4
addr: 0x0f36b45c, data: 0x97811f88
addr: 0x014c48d1, data: 0x3a78f3a1
addr: 0x05d5b3c9, data: 0xa594f538
addr: 0x096aa20e, data: 0x5d4efc72
addr: 0x0b86b00a, data: 0xdadf5f43

I then generated 10M values from each implementation and compared the generated data files:

sha256sum big_c.txt
e3d4d2e5b7e64aca612e09ef831fcab91f1cad7cf66d1086ae931da900c2ef85  big_c.txt

shasum256 big_go.txt
e3d4d2e5b7e64aca612e09ef831fcab91f1cad7cf66d1086ae931da900c2ef85  big_go.txt
secworks commented 1 month ago

Generated 1 GW32 binary data from the C-implementation and tested with ENT:

Entropy = 8.000000 bits per byte.

Optimum compression would reduce the size
of this 4000000000 byte file by 0 percent.

Chi square distribution for 4000000000 samples is 215.37, and randomly
would exceed this value 96.60 percent of the times.

Arithmetic mean value of data bytes is 127.5001 (127.5 = random).
Monte Carlo value for Pi is 3.141636609 (error 0.00 percent).
Serial correlation coefficient is 0.000031 (totally uncorrelated = 0.0).
secworks commented 1 month ago

Results from testing with PractRand:

RNG_test using PractRand version 0.94
RNG = RNG_stdin8, seed = unknown
test set = core, folding = standard (8 bit)

rng=RNG_stdin8, seed=unknown
length= 512 megabytes (2^29 bytes), time= 3.9 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN(2+0,13-1,T)                  R=  +3.2  p = 0.097     normal
  BCFN(2+1,13-1,T)                  R= +18.4  p =  2.4e-9    VERY SUSPICIOUS
  BCFN(2+2,13-2,T)                  R=  +5.9  p = 0.011     normalish
  BCFN(2+3,13-2,T)                  R=  +3.1  p = 0.102     normal
  BCFN(2+4,13-3,T)                  R=  -0.5  p = 0.562     normal
  BCFN(2+5,13-3,T)                  R=  -1.8  p = 0.763     normal
  BCFN(2+6,13-4,T)                  R=  +1.3  p = 0.279     normal
  BCFN(2+7,13-5,T)                  R=  -2.4  p = 0.836     normal
  BCFN(2+8,13-5,T)                  R=  -0.7  p = 0.592     normal
  BCFN(2+9,13-6,T)                  R=  -0.7  p = 0.590     normal
  BCFN(2+10,13-6,T)                 R=  -0.0  p = 0.470     normal
  BCFN(2+11,13-7,T)                 R=  +0.3  p = 0.399     normal
  BCFN(2+12,13-8,T)                 R=  +1.5  p = 0.218     normal
  BCFN(2+13,13-8,T)                 R=  -0.4  p = 0.502     normal
  BCFN(2+14,13-9,T)                 R=  -2.4  p = 0.911     normal
  BCFN(2+15,13-9,T)                 R=  +1.6  p = 0.191     normal
  DC6-9x1Bytes-1                    R= +2679  p =  4e-1572    FAIL !!!!!!!!
  Gap-16:A                          R=  +3.3  p = 0.023     normal
  Gap-16:B                          R=  +1.3  p = 0.182     normal
  FPF-14+6/16:(0,14-0)              R=  -3.7  p =1-4.7e-3   normal
  FPF-14+6/16:(1,14-0)              R=  -2.0  p = 0.925     normal
  FPF-14+6/16:(2,14-0)              R=  -4.8  p =1-3.7e-4   normal
  FPF-14+6/16:(3,14-0)              R=  -3.8  p =1-3.3e-3   normal
  FPF-14+6/16:(4,14-0)              R=  -4.6  p =1-5.8e-4   normal
  FPF-14+6/16:(5,14-1)              R=  -1.8  p = 0.905     normal
  FPF-14+6/16:(6,14-2)              R=  -0.9  p = 0.737     normal
  FPF-14+6/16:(7,14-2)              R=  -3.7  p =1-4.0e-3   normal
  FPF-14+6/16:(8,14-3)              R=  -1.8  p = 0.898     normal
  FPF-14+6/16:(9,14-4)              R=  -1.5  p = 0.855     normal
  FPF-14+6/16:(10,14-5)             R=  +0.2  p = 0.445     normal
  FPF-14+6/16:(11,14-5)             R=  -1.9  p = 0.916     normal
  FPF-14+6/16:(12,14-6)             R=  +0.0  p = 0.491     normal
  FPF-14+6/16:(13,14-7)             R=  -0.6  p = 0.660     normal
  FPF-14+6/16:(14,14-8)             R=  +1.1  p = 0.217     normal
  FPF-14+6/16:(15,14-8)             R=  +0.2  p = 0.432     normal
  FPF-14+6/16:(16,14-9)             R=  -0.9  p = 0.720     normal
  FPF-14+6/16:(17,14-10)            R=  +0.6  p = 0.313     normal
  FPF-14+6/16:(18,14-11)            R=  -0.8  p = 0.679     normal
  FPF-14+6/16:(19,14-11)            R=  -2.1  p = 0.980     normal
  FPF-14+6/16:all                   R=  -9.7  p =1-4.6e-9    VERY SUSPICIOUS
  FPF-14+6/16:cross                 R=  -0.8  p = 0.785     normal
  BRank(12):128(4)                  R= +2684  p~=  1e-1428    FAIL !!!!!!!!
  BRank(12):256(4)                  R= +6301  p~=  5e-3352    FAIL !!!!!!!!
  BRank(12):384(1)                  R= +4889  p~=  1e-1472    FAIL !!!!!!!!
  BRank(12):512(2)                  R= +9364  p~=  5e-2820    FAIL !!!!!!!!
  BRank(12):768(1)                  R= +9883  p~=  3e-2976    FAIL !!!!!!!!
  BRank(12):1K(2)                   R=+18528  p~=  1e-5578    FAIL !!!!!!!!
  BRank(12):1536(1)                 R=+19226  p~=  1e-5788    FAIL !!!!!!!!
  BRank(12):2K(1)                   R=+25361  p~=  1e-7635    FAIL !!!!!!!!
  mod3n(5):(0,9-0)                  R=  +0.8  p = 0.349     normal
  mod3n(5):(1,9-1)                  R=  +3.9  p = 0.028     normal
  mod3n(5):(2,9-1)                  R=  +1.6  p = 0.216     normal
  mod3n(5):(3,9-2)                  R=  -1.2  p = 0.728     normal
  mod3n(5):(4,9-2)                  R=  -4.7  p =1-7.1e-3   normal
  mod3n(5):(5,9-3)                  R=  -3.1  p = 0.948     normal
  mod3n(5):(6,9-3)                  R=  +0.3  p = 0.434     normal
  mod3n(5):(7,9-4)                  R=  +1.4  p = 0.235     normal
  mod3n(5):(8,9-4)                  R=  -1.5  p = 0.777     normal
  mod3n(5):(9,9-5)                  R=  +3.7  p = 0.041     normal
  mod3n(5):(10,9-5)                 R=  +0.6  p = 0.355     normal
  mod3n(5):(11,9-6)                 R=  -1.5  p = 0.784     normal
  mod3n(5):(12,9-6)                 R=  +0.6  p = 0.316     normal
  mod3n(5):(13,9-6)                 R=  -1.6  p = 0.810     normal
  mod3n(5):(14,9-6)                 R=  -0.1  p = 0.466     normal
  mod3n(5):(15,9-6)                 R=  -1.7  p = 0.819     normal
  TMFn(2+0):wl                      R=  -3.0  p~= 0.8       normal
  TMFn(2+1):wl                      R=  +1.1  p~= 0.3       normal
  TMFn(2+2):wl                      R=  +1.1  p~= 0.3       normal
  TMFn(2+3):wl                      R=  -1.8  p~= 0.7       normal
  TMFn(2+4):wl                      R=  +0.5  p~= 0.4       normal
  [Low1/8]BCFN(2+0,13-3,T)          R=  +0.9  p = 0.339     normal
  [Low1/8]BCFN(2+1,13-3,T)          R=  -4.4  p = 0.970     normal
  [Low1/8]BCFN(2+2,13-4,T)          R=  +1.5  p = 0.261     normal
  [Low1/8]BCFN(2+3,13-4,T)          R=  -2.5  p = 0.851     normal
  [Low1/8]BCFN(2+4,13-5,T)          R=  -1.9  p = 0.779     normal
  [Low1/8]BCFN(2+5,13-5,T)          R=  -1.1  p = 0.657     normal
  [Low1/8]BCFN(2+6,13-6,T)          R=  -0.4  p = 0.527     normal
  [Low1/8]BCFN(2+7,13-6,T)          R=  -3.9  p = 0.968     normal
  [Low1/8]BCFN(2+8,13-7,T)          R=  -2.0  p = 0.799     normal
  [Low1/8]BCFN(2+9,13-8,T)          R=  +2.1  p = 0.162     normal
  [Low1/8]BCFN(2+10,13-8,T)         R=  +3.1  p = 0.097     normal
  [Low1/8]BCFN(2+11,13-9,T)         R=  +1.2  p = 0.238     normal
  [Low1/8]BCFN(2+12,13-9,T)         R=  -0.7  p = 0.553     normal
  [Low1/8]DC6-9x1Bytes-1            R=  +4.2  p = 0.039     normalish
  [Low1/8]Gap-16:A                  R=  +0.1  p = 0.636     normal
  [Low1/8]Gap-16:B                  R=  -1.6  p = 0.865     normal
  [Low1/8]FPF-14+6/16:(0,14-0)      R=  +3.0  p = 0.018     normal
  [Low1/8]BCFN(2+10,13-8,T)         R=  +3.1  p = 0.097     normal
  [Low1/8]BCFN(2+11,13-9,T)         R=  +1.2  p = 0.238     normal
  [Low1/8]BCFN(2+12,13-9,T)         R=  -0.7  p = 0.553     normal
  [Low1/8]DC6-9x1Bytes-1            R=  +4.2  p = 0.039     normalish
  [Low1/8]Gap-16:A                  R=  +0.1  p = 0.636     normal
  [Low1/8]Gap-16:B                  R=  -1.6  p = 0.865     normal
  [Low1/8]FPF-14+6/16:(0,14-0)      R=  +3.0  p = 0.018     normal
  [Low1/8]FPF-14+6/16:(1,14-0)      R=  -2.4  p = 0.958     normal
  [Low1/8]FPF-14+6/16:(2,14-1)      R=  +2.0  p = 0.087     normal
  [Low1/8]FPF-14+6/16:(3,14-2)      R=  -2.0  p = 0.926     normal
  [Low1/8]FPF-14+6/16:(4,14-2)      R=  -0.5  p = 0.642     normal
  [Low1/8]FPF-14+6/16:(5,14-3)      R=  +0.2  p = 0.431     normal
  [Low1/8]FPF-14+6/16:(6,14-4)      R=  +3.8  p =  5.2e-3   normal
  [Low1/8]FPF-14+6/16:(7,14-5)      R=  +3.0  p = 0.020     normal
  [Low1/8]FPF-14+6/16:(8,14-5)      R=  -0.3  p = 0.576     normal
  [Low1/8]FPF-14+6/16:(9,14-6)      R=  +0.7  p = 0.306     normal
  [Low1/8]FPF-14+6/16:(10,14-7)     R=  -0.6  p = 0.666     normal
  [Low1/8]FPF-14+6/16:(11,14-8)     R=  -0.1  p = 0.495     normal
  [Low1/8]FPF-14+6/16:(12,14-8)     R=  +2.8  p = 0.035     normal
  [Low1/8]FPF-14+6/16:(13,14-9)     R=  -0.0  p = 0.472     normal
  [Low1/8]FPF-14+6/16:(14,14-10)    R=  -0.5  p = 0.603     normal
  [Low1/8]FPF-14+6/16:(15,14-11)    R=  +0.2  p = 0.383     normal
  [Low1/8]FPF-14+6/16:(16,14-11)    R=  -1.9  p = 0.964     normal
  [Low1/8]FPF-14+6/16:all           R=  +1.4  p = 0.178     normal
  [Low1/8]FPF-14+6/16:cross         R=  +0.5  p = 0.291     normal
  [Low1/8]BRank(12):128(4)          R=  -1.4  p~= 0.890     normal
  [Low1/8]BRank(12):256(2)          R=  -0.2  p~= 0.554     normal
  [Low1/8]BRank(12):384(1)          R=  +0.4  p~= 0.366     normal
  [Low1/8]BRank(12):512(2)          R=  +0.8  p~= 0.293     normal
  [Low1/8]BRank(12):768(1)          R=  +0.4  p~= 0.366     normal
  [Low1/8]BRank(12):1K(1)           R=  +1.8  p~= 0.146     normal
  [Low1/8]mod3n(5):(0,9-2)          R=  -3.6  p = 0.967     normal
  [Low1/8]mod3n(5):(1,9-2)          R=  -1.6  p = 0.787     normal
  [Low1/8]mod3n(5):(2,9-3)          R=  +1.9  p = 0.169     normal
  [Low1/8]mod3n(5):(3,9-3)          R=  -0.6  p = 0.615     normal
  [Low1/8]mod3n(5):(4,9-4)          R=  -4.0  p = 0.987     normal
  [Low1/8]mod3n(5):(5,9-4)          R=  -1.2  p = 0.714     normal
  [Low1/8]mod3n(5):(6,9-5)          R=  +1.7  p = 0.185     normal
  [Low1/8]mod3n(5):(7,9-5)          R=  +0.1  p = 0.444     normal
  [Low1/8]mod3n(5):(8,9-6)          R=  -1.1  p = 0.701     normal
  [Low1/8]mod3n(5):(9,9-6)          R=  +1.3  p = 0.213     normal
  [Low1/8]mod3n(5):(10,9-6)         R=  +1.4  p = 0.205     normal
  [Low1/8]mod3n(5):(11,9-6)         R=  -2.4  p = 0.936     normal
  [Low1/8]mod3n(5):(12,9-6)         R=  -1.1  p = 0.695     normal
  [Low1/8]TMFn(2+0):wl              R=  +0.7  p~= 0.4       normal
  [Low1/8]TMFn(2+1):wl              R=  -1.9  p~= 0.7       normal

Actually a bit better than I expected from a simple PRNG.

dehanj commented 1 month ago

Great results! I'm not so familiar with the output from PractRand, so we can discuss that on Monday. Otherwise I'm happy and ready to approve.

The only remaining thing I can think of is to measure the affect on the start up time. I don't think there is a difference that one would notice, but it is good to have it quantified. I can do that task.

secworks commented 1 month ago

Some info on interpreting PractRand test result: https://pracrand.sourceforge.net/Tests_overview.txt

Failures on short-ranged tests, especially DC6 parameterizations, are 
typical of small chaotic PRNGs with insufficient mixing / avalanche.  
Failures on FPF parameterizations can have similar meaning.  However, 
both of those find bias in a wide variety of PRNGs, sometimes for 
non-obvious reasons.  

BCFN failures where the first parameter listed is high (say, 5 or 
more) typically reflect a cyclic buffer being traversed, where 
patterns in the content of that buffer show up at specific spacings.  
On the other hand, BCFN failures where the first parameter listed is 
low are typically similar in meaning to a failure on a DC6 
parameterization (see the previous paragraph).  Note that when I 
refer to the value of the first BCFN parameter, I'm refering to the 
evaluated value listed in the output, meaning that for a test result 
listed under "BCFN(2+3,13-1)" the first parameter would be 5 (2+3). 

Failures on BRank suggest that in some way the PRNG output, or at 
least part of it, was extremely linear, producable strictly by xoring 
bits of previous PRNG output.  This is classically seen on LFSRs / 
GFSRs, but sometimes it shows up on PRNGs that are *supposed* to be 
chaotic, but actually have some component that isn't really.  

Long discussion with the author of PractRand about interpretation of the BCFN results. https://sourceforge.net/p/pracrand/discussion/366935/thread/0dd1fc3e/

dehanj commented 1 month ago

Some info on interpreting PractRand test result: https://pracrand.sourceforge.net/Tests_overview.txt

...

Long discussion with the author of PractRand about interpretation of the BCFN results. https://sourceforge.net/p/pracrand/discussion/366935/thread/0dd1fc3e/

The fails we get seems to be explainable, which is comforting.

I have measured the difference between the new RAM fill, compared to the current.

Current 55 ms. xorwow: 133 ms.

So we have an increase of 78 ms on the start up. More than I expected, what do we think about that?

Also measured

// Set RAM address and data scrambling values
*ram_rand = rnd_word();
*ram_scramble = rnd_word();

which takes 30 ms.

Edit: Added the measurement for RAM address and data scrambling.

secworks commented 1 month ago

Some info on interpreting PractRand test result: https://pracrand.sourceforge.net/Tests_overview.txt ... Long discussion with the author of PractRand about interpretation of the BCFN results. https://sourceforge.net/p/pracrand/discussion/366935/thread/0dd1fc3e/

The fails we get seems to be explainable, which is comforting.

I have measured the difference between the new RAM fill, compared to the current.

Current 55 ms. xorwow: 133 ms.

So we have an increase of 78 ms on the start up. More than I expected, what do we think about that?

More than I expected too. But we also do additional RNG-reads that could be removed to offset this delay. My ESP sense tells me that we would save 30 ms.

dehanj commented 1 month ago

Yes. An approach would be to remove the first two RNG-reads to set the RAM and address scrambling, which would save us 30 ms.

The addition of xorwow would then land on an increase of 48 ms. Which is OK, I think.

dehanj commented 3 weeks ago

Completed measurements, in summary: Current implementation: 55ms. xorwow: 132 ms xorshift: 114 ms 1 word read from TRNG (with full wait from another TRNG read): 15 ms.

Which also accomodates most of the difference between xorwow and xorshift, since it xorwow makes two reads.