usnistgov / SP800-90B_EntropyAssessment

The SP800-90B_EntropyAssessment C++package implements the min-entropy assessment methods included in Special Publication 800-90B.
195 stars 88 forks source link

how to run ea_restart? #207

Closed xiaoyaowana closed 1 year ago

xiaoyaowana commented 1 year ago

I read the document of "row dataset" format described in SP800-90B Section 3.1.4.1. image

I have a 0-1 binary bit stream of 10M in total. Looking at the document, is it means that this bit stream should be constructed into a 1000 1000 matrix? In Python, that is reshape M.(1000,1000), where M is the bitstream data. But I take this 1000 1000 matrix as input for earestart and still fails. I'm sorry, I don't know the right anwser for the parameter of "bits per_ symbol" ,but I tried every number from 1 to 8 and failed. I hope to get a reply about what format of bitstream can run through ea_restart program, thank you!(my bitstream already pass the ea_iid program)

joshuaehill commented 1 year ago

You can't form a restart data set using a single large sample. Instead, you need to follow the instructions in the paragraph that you have highlighted, and take the first 1000 (for your noise source, single bit) raw samples from each of 1000 distinct entropy source restarts. For hardware, a "restart" generally involves power cycling the hardware between taking each 1000 symbol sample. The first bullet under you highlighted text tells you how to represent this in a file.

For the NIST tool, you only need to provide the row dataset, as the column dataset is generated by the tool internally using the row dataset.

The particulars of the formatting each sample from the noise source are the same as for the large raw data sample: you need one raw sample per byte; for your source (as it produces a bitstring), this will be a sequence of single byte integers (uint8_t), where each value is either 0x00 or 0x01, each representing one bit of output from your noise source.

The data used to form the restart data is described in the first bullet: the data file should contain the first 1000 symbols output from the noise source in the first restart in the order they were produced, followed by the first 1000 symbols output from the noise source in the second restart in the order they were produced, followed by the first 1000 symbols output from the noise source in the third restart in the order they were produced, and so on, until you finish the file with the first 1000 symbols output from the noise source in the 1000th restart in the order they were produced. The result will be a single file with 1 million bytes which you use with ea_restart.

Implementing this as a matrix is unlikely to be helpful, unless you are trying to implement the tool yourself (whence, it may be useful, as the column dataset is just the transpose of the row dataset).

xiaoyaowana commented 1 year ago

Thank you for your prompt reply. Could you please give an example of the row dataset M or a file that can be run by the ea_restart program? I listed the program files that may be run in the following figure. Of course, (a) has failed. DKYL{ @WKUU)DFXQ7H6}`(4 I have a little difficulty in understanding this sentence “the row dataset is M[1][1] ||...|| M[1][c] || M[2][1] ||...|| M[2][c] || ... || M[r][1] ||...|| M[r][c]”.What consists of the M[1][1]..or M[r][c](element of matrix M).As I noted in the figure,I'm confused about the elements of the matrix. I may have mistaken the format description of the input file and output file, but unfortunately, I have not yet obtained the output file.

joshuaehill commented 1 year ago

The symbol || means concatenation in this document. This (and the other notation used within the document) is described in SP 800-90B Section 1.3.

For (a) the file (represented as a string of uint8_t values) would look like

0x01 0x00 ... 0x01 0x00 0x01 ... 0x00 ... ... ... ... 0x00 0x01 ... 0x01

I've produced an example restart row dataset file for a source that emits single bits, and put it here.

xiaoyaowana commented 1 year ago

Thank you very much! I opened file that you gave me. But when I went to get this file to run the restart program, an error occurred. Is there something wrong with my input instruction? In linux system, I enter "./ea_restart -i -v example-restart-data.bin 1" I tried every number of [bits_per_symbol] from 1 to 4, the error is: 【 Opening file: 'example-restart-data.bin' Loaded 1000001 samples made up of 3 distinct 4-bit-wide symbols.

Error: data does not contain 1000000 samples 】 I'm sorry to disturb you again about probably naive question...what is the right instruction to run the restart program using the file you gave me.

The symbol || means concatenation in this document. This (and the other notation used within the document) is described in SP 800-90B Section 1.3.

For (a) the file (represented as a string of uint8_t values) would look like

0x01 0x00 ... 0x01 0x00 0x01 ... 0x00 ... ... ... ... 0x00 0x01 ... 0x01

I've produced an example restart row dataset file for a source that emits single bits, and put it here.

joshuaehill commented 1 year ago

The first thing to check is that the file unzipped correctly; it should have a file size of exactly 1,000,000 bytes. Does it?

Your pasted results suggests both that it isn't the right size (the results suggest that it is 1,000,001 bytes), and suggests that the file was somehow corrupted (there should not be 3 distinct symbols, there should only be 2). Did you perhaps append a single byte to the end (e.g., open the file in a text editor and then save it, which would have added a newline character?)

I would suggest unzipping the file and trying again.

xiaoyaowana commented 1 year ago

The first thing to check is that the file unzipped correctly; it should have a file size of exactly 1,000,000 bytes. Does it?

Your pasted results suggests both that it isn't the right size (the results suggest that it is 1,000,001 bytes), and suggests that the file was somehow corrupted (there should not be 3 distinct symbols, there should only be 2). Did you perhaps append a single byte to the end (e.g., open the file in a text editor and then save it, which would have added a newline character?)

I would suggest unzipping the file and trying again.

Yeah, I got it. Thank you for your reply. I guess it exists difference between Windows and Linux system with unzipping. Yesterday, I unzipped the file in Windows and transfer it through WinSCP to my Linux server, ea_restart failed. Today,I unzip the file in Linux, ea_restart pass.

joshuaehill commented 1 year ago

If this issue has been addressed, could you please close it?

joshuaehill commented 1 year ago

Thanks!