Open DeeFuse opened 2 years ago
I don't think that that function actually generates 8 bits of entropy, so it should probably do more than 32 calls. I tried 256 calls and it was still much much faster than the clock-based code, so a large number of calls definitely wouldn't be problematic. It might also be a good idea to skip the ad-hoc shifting and masking code used in that function and feed everything directly into the hashing function.
Unfortunately I got no AVR board to test the entropy.
I can imagine that the result of the AD conversion will fluctuate in a range 0+X < n < 1023 - Y thus the higher bits might not contribute much entropy. I think the masking and shifting should compensate for that.
Instead of calling hydro_random_rbyte
256 times (and masking the result to one bit) I'd rather increase the number of nConversions
inside this function.
@jedisct1 Whats your opinion on the entropy quality?
Some insight into the entropy quality can be seen on the bottom of this page
When I tried it, I didn't mask hydro_random_rbyte
down. I just let it feed the output directly to hydro_hash_update
256 times by changing ebits += 8
to ebits += 1
.
I had a look into the hydro_hash_update function, it seems that the input is XORed with the state and then fed into the gimli_core_u8
every gimli_RATE (16) bytes.
I could see a benefit of creating a configuration macro like LIBHYDROGEN_RNG_ENTROPY_BITS
instead of the hardcoded 256 bits in the hydro_random_init
functions.
That would allow to select the "quality" of the RNG initialisation depending of your needs (better quality vs faster initialisation)
I agree that we should feed everything to the hash function instead of shifting&masking.
How many unbiased bits can we safely assume here?
With ebits += 1
, the initialization procedure completes in 400ms on the Uno, which should be acceptable for most applications.
If you feel like it's not, ebits += 2
would remain safe with a single unbiased bit.
That would allow to select the "quality" of the RNG initialisation depending of your needs (better quality vs faster initialisation)
Having an unsafe option is not a good idea.
I can't give a definite answer on the number of unbiased bits from the code alone. I'll look if i can find an AVR arduino in my stash to generate a larger number of random numbers to have a dataset to work with.
The Uno's serial USB connection is unfortunately very slow, but I have a few hundred KiB of output data so far. Testing with nConversions=20, nConversions=100, nConversions=1, and in a bit I'll change the code to remove the shifting and looping completely and try that.
A basic test could be to simply create the average of all random uint8_t outputs of the generator function and observe if it converges towards 127. An other option would be to count the amount of each random number. The conversion loop (nConversion) has to be >= 2, otherwise the ADC trickery with the High/Low pulling of the reference and the resulting random sample interval would not work. IMHO the shift / mask can be omitted but not the conversion loops.
That is true. Right now I'm testing it with a global instead so that it still pulls up and down for each call, without looping internally. Also, seems like the main slowdown was actually random_rbyte, not the serial interface, so I have a very large amount of data for N=1 and what I described in the previous sentence.
Doing some more testing, any prescaler option other than 64 seems to produce fewer unique readings from the ADC. Also, regardless of settings, at least half of all readings are 0.
I'd say that the 0 readings are expected, ever 2nd itteration the ADC gets grounded to be able to recharge it. we could optimize the code to ground and recharge in the same itteration.
I generated 17MiB with this code
uint8_t
random_rbyte()
{
uint8_t rnd = 0;
ADMUX = 0b01001111; // Low (0V)
ADMUX = 0b01001110; // High (1.1V)
ADCSRA |= 1 << ADSC; // Start a conversion
// ADSC is cleared when the conversion finishes
while ((ADCSRA >> ADSC) & 1) {
// wait for the conversion to complete
}
rnd = ADCL; // do not swap this sequence. Low has to be fetched first.
rnd ^= ADCH; // the value is always between 0-3
return rnd;
}
and got the following byte value distribution
2 01000101
4 01001001
428 01001010
295 01001011
8 01010100
40 01010101
5 01010110
328 01010111
239 01011000
1 01011101
2 01011110
11 01011111
58 01100000
4 01100001
15 01100011
1684 01100100
2217 01100101
9 01100110
7 01101001
9 01101010
5 01101100
68 01101101
317 01101110
68 01101111
129 01110000
3413 01110001
2605 01110010
10 01110011
4 10010011
5 10010100
13 10011001
9 10011010
11 10011011
10 10011100
2 10100111
5 10101000
87 10101001
647 10101010
28555 10101011
358204 10101100
1229554 10101101
1625192 10101110
634598 10101111
959201 10110000
1987364 10110001
1220261 10110010
933451 10110011
1656613 10110100
2056008 10110101
1719886 10110110
1588002 10110111
986552 10111000
669690 10111001
120899 10111010
9067 10111011
12170 10111100
7023 10111101
13495 10111110
10785 10111111
16736 11000000
7771 11000001
8332 11000010
1009 11000011
282 11000100
198 11000101
64 11000110
59 11000111
52 11001000
44 11001001
7 11001010
2 11001011
How frequently do you get the same value twice in a row? What distribution do you get with only the 2 or 4 last bits? And with a Von Neumann extractor?
What kind of tools are good for this sort of analysis? I've just been stringing commands together in bash, but that's starting to feel quite inadequate. Looking at every individual pair of bytes, 158261 have the same byte twice in a row. Here's the actual file I generated https://0x0.st/-tyR.txt (it's raw binary data, not sure why 0x0 added the .txt) if you want to do your own analysis on it.
Last 4 bits:
976124 0000
1998552 0001
1231198 0010
934489 0011
1658592 0100
2058465 0101
1719964 0110
1588391 0111
986848 1000
669845 1001
121999 1010
37930 1011
370389 1100
1236646 1101
1639006 1110
645462 1111
Last 2 bits:
3991953 00
5963508 01
4712167 10
3206272 11
I'm doubt that 1 instruction is not enough to discharge the internal capacitor to have a sufficient long charge time and range for a good distribution
ADMUX = 0b01001111; // Low (0V) ADMUX = 0b01001110; // High (1.1V)
I changed it so that it sets it to low right at the end of the function, and sets it to high right before sampling. This seems to increase the number of unique readings.
I generated 1.6GiB. It has 100 different distinct byte values. I'm currently writing a tool in Zig to extract and analyze it.
I ran a Von Neumann extractor on just the lowest bit of each byte, and the output was pretty even according to ent
:
Entropy = 7.984414 bits per byte.
Optimum compression would reduce the size
of this 31211520 byte file by 0 percent.
Chi square distribution for 31211520 samples is 673722.13, and randomly
would exceed this value less than 0.01 percent of the times.
Arithmetic mean value of data bytes is 127.5149 (127.5 = random).
Monte Carlo value for Pi is 3.187330063 (error 1.46 percent).
Serial correlation coefficient is 0.001175 (totally uncorrelated = 0.0).
It also passes a few dieharder tests, but the size (30M) is not big enough for most of the tests. Doing an extraction on every bit of every byte gave quite a crappy result. Next I'm going to separately test each bit position in the same manner.
I'm currently running another arduino sketch that generates 4096 samples and then resets, to see if there are any right-after-boot corelations between runs.
I have now done Von Neumann extraction on each bit position. Here are the sizes of each output file:
Bytes MB File Output size / Input size
31211520 30M von_neumann_bit0 0.14922
23101440 23M von_neumann_bit1 0.11045
18657280 18M von_neumann_bit2 0.08920
18440192 18M von_neumann_bit3 0.08816
1937408 1.9M von_neumann_bit4 0.00926
962560 940K von_neumann_bit5 0.00460
372736 364K von_neumann_bit6 0.00178
323584 316K von_neumann_bit7 0.00155
According to ent [1] each of these files is high entropy. Combined with the decrease in file sizes, I interpret this as meaning that while the high bits are very biased, they are still quite decorrelated. This doesn't mean that they're independent, but I think it's a good sign. I think an iterated Von Neumann extractor might be useful to get more examinable data out of the raw data, but I'm not sure.
[1] https://paste.sr.ht/~lonjil/b6814c2139c2f55c9d37abece27aab44bc568823
This is great!
How long does it take to extract 256 bits from the lower bit?
On my phone so my numbers are inexact, but I believe it took around 1707 input bits for Von Neumann to produce 256 bits.
As a point of reference regarding performance, this hydro_random_init [1] implementation takes ~517 milliseconds to run hydro_init()
. 2048 samples, which I believe means 131 (?) or so invocations of the Gimli function. With 1024 samples, it takes ~268 milliseconds. 4096 samples ends up at ~1014 milliseconds.
[1] https://paste.sr.ht/~lonjil/4a2cd787141af107fb5ac3b7eb3914b9d2034eef
I see that you removed the ADC converisontime "randomisation". With that in place the randomnes of the readings could be increased.
I'm not sure it would. Changing the scaler to 2 makes it not random at all, and changing it to 128 makes it much slower while also reducing the entropy per reading (at a glance anyway, I didn't do a rigorous test). I think the other possible selections other than 64 all produced less entropy as well, so any extra randomness from an unpredictable ADC clock might just be lost from a more predictable sample output in some of the cases.
Also, the clock scaler needs to be set to a different value depending on the chip clock rate. An atmega328p at 8mhz should probably use 32, one at 4mhz should probably use 16, and so on, and one at 20mhz might do best at
I ran an implemention of NIST 800-90B non-IID min-entropy estimator on 2 million samples, and it gave 0.315568 bits per sample.
I've tested the codes with a ATmega2561 in a comercial appliance I had at hand. Following porblems emerged:
Since different AVR micros differ so strongly on so many points, it may be necessary to have bespoke routines for every model :/
This is sad :(
And we won't be able to test on every AVR model.
I guess the code will have to have preprocessor directives for the target chip, and an error instructing people to open an issue for un-implemented models.
Also, I wonder if there is a published in-depth description of the ADC hardware anywhere, so that someone with the appropriate physics knowledge could figure out a lower bound on the min-entropy, rather than the upper bound we get from sample testing.
I think we won't be able to determine a definite min-entropy bound for that type of source. After running the tests on my boards I think this aproach is not fitting for this librarys goar of beeing "...hard-to-misuse..." when it is possible to accidentially "disable" the random data source by applying a "common hardware design" TL;DR: I think we should not rely on external circumstances / hardware designs for a source of random data
I'm currently testing an other (WDT + Timer based) RNG which would be independent of external / user influencable circumstances. The randomnes here will come from the "low accuracy" of the internal 128kHz RC oscillator for the WDT.
Entropy = 7.957624 bits per byte.
Optimum compression would reduce the size
of this 4096 byte file by 0 percent.
Chi square distribution for 4096 samples is 239.62, and randomly
would exceed this value 74.71 percent of the times.
Arithmetic mean value of data bytes is 127.8352 (127.5 = random).
Monte Carlo value for Pi is 3.131964809 (error 0.31 percent).
Serial correlation coefficient is 0.031658 (totally uncorrelated = 0.0).
@DeeFuse would you mind posting the code for that?
@lonjil You can find the code I've used here (cleaned up) My test hardware was a ATmega2561 with F_CPU = 14745600UL
Old PR and discussion, but still relevant :)
What should we do?
I could clean up my GIST with the updated code and push it to my PR. I'm no security / crypto expert. How do you feel about the implementation of randomnet used in my code?
The implementation looks fine, but I don't have the hardware to test it on.
If you want to have a look into the randomnes of this implementation I can send you a bin file for analysis.
The implementation looks fine, but I don't have the hardware to test it on.
I have a lot of different boards available and can help out if needed.
I created a 10k number sequence as well as 10 shorter dumps after booting the MCU: random.zip I've done some reading about the AVR hardware differences which need to be accounted for to make it foolproof to use it. After implementing some #ifdef to catch those hardware differences I'll feel comfortable to push / merge the code into libhydrogen. @justind000 I've updated my gist with the latest code. Feel free to run it on your AVRs. The random numbers are printed to USART1 @ 115200 baud
Shall we backup the states of the used perihpherals (WDT / TIM1) and restore them after the random number generation?
Shall we backup the states of the used perihpherals (WDT / TIM1) and restore them after the random number generation?
That sounds like a good idea. Applications may not expect them to be changed.
I got a question about the coding style: I can calculate the required timer parameters at compile time or at runtime. The main difference is that the code when compilet with -01 or higher will be equal but at -O0 there is a huge difference. The drawback of the precompiler calculation is the readability. Which solution shall I include? test1 - Preprocessor / test2 - runtime (or optimized) https://godbolt.org/z/z37E6ox6s
Runtime.
I'm thinking about disabling the function 'hydro_random_reseed' when building on an AVR unless the user explicitly enables it. If the user calls this function while running their own code it could cause a lot of problems due to the reconfiguration of TIM1 and the ~4second execution time.
Remove "Arduino.h" dependency Fixes #115