usnistgov / SP800-90B_EntropyAssessment

The SP800-90B_EntropyAssessment C++package implements the min-entropy assessment methods included in Special Publication 800-90B.
202 stars 88 forks source link

How to generate binary type input file #196

Open atri7887 opened 2 years ago

atri7887 commented 2 years ago

I have a text file that has multiple 64 bit binary data(example given below) extracted from PUF primitives. How do I convert it to a suitable input file? This input file creation method should ideally be present in the readme file.

File: binary_signature.txt 1001001101010100111010001000011010000111010100100101011100011011 1011001100011001100110110111001000111001111110000101110111000001 0101100010010010111110010011110110101111011010100100000100011100 . .
. 1110011011000000100100100011100011000101110001110111010110110001

joshuaehill commented 2 years ago

Short answer: The tool automatically translates values, so if you limit the file to ASCII '0' and ASCII '1' (with no new lines, or any other characters in the file), that will do the right thing.

The "correct" answer is to present this binary data as a string of '0x00' bytes (for '0') and '0x01' bytes (for '1').

atri7887 commented 2 years ago

Can you please present a sample for this?-> "The "correct" answer is to present this binary data as a string of '0x00' bytes (for '0') and '0x01' bytes (for '1')"

Also, is it absolutely necessary to present 1 million data samples to run the tool? Will it fail the test in its absence?

joshuaehill commented 2 years ago

I'm not really sure what language you are using; so it is difficult to produce a comprehensive answer that would be useful to you. This isn't a task that is specific to this program, so you should probably consult a tutorial site for the language you are using, and take a look at how that language deals with binary file I/O.

As an example, if you are interesting in writing binary files. In C, you could use calls like fwrite to accomplish this. In this case, you will want to make sure that the types that you are using are sized appropriately (e.g., uint8_t). Such a tutorial site for C is here. Adapting a program example from that site:

 #include<stdio.h>

int main () {
   FILE *fp;
   uint8_t sample;

   fp = fopen( "data.bin" , "wb" );
   for(int j=0; j<1000000; j++) {
      sample = get_noise_sample(); //get a 1-bit value from the noise sample
      fwrite(sample , 1 , sizeof(uint8_t) , fp );
   }

   fclose(fp);

   return(0);
}
atri7887 commented 2 years ago

Thanks a lot. This is really helpful. Also, can you kindly add this comment to the readme/ user guide for completeness?

thanoojarao commented 2 years ago

I have a test file named sample.txt with 1024 samples data as a string without new line Is this correct file format? or should I need to do any changes command to run test : ./ea_non_iid -i-v sample.txt 8 sample.txt: 0x4d0xb20x850x7a0x850x7d0xc20x3d0x820x7d0x820x7d0x820x3c0xc3..................0x7d0x980xe30xbc0xc30xbc0xe0x710x8e0x610xde0x00xf70x80x3e0x810.........................................................0xac0x130xee0x310x8e0x75

Chaosequals commented 11 months ago

Hello, I'm trying to understand NIST-SP800-90b, specifically the use of the parameter [bits_per_symbol] in ea_non_iid.

For the given binary_signature.txt, shall the [bits_per_symbol] be set as 64?

However, I understand that [bits_per_symbol] should be small enough to fit within a single byte. Does this mean I should divide the original data into 8 segments, with each segment being 8 bits (therefore, bits_per_symbol = 8)? I'm concerned that doing so might alter the original data's physical meaning.

Could you provide some guidance on this? Thanks.

The provided File: binary_signature.txt by atri7887: 1001001101010100111010001000011010000111010100100101011100011011 1011001100011001100110110111001000111001111110000101110111000001 0101100010010010111110010011110110101111011010100100000100011100 . . . 1110011011000000100100100011100011000101110001110111010110110001

fogking commented 11 months ago

@joshuaehill Can we know the implementation of get_noise_sample()?

I can't see the values in the sample as binary, could you post the source code or the process of creating the sample in the README?

fogking commented 11 months ago

Resolved the issue. I created a sample file with []bytes that I wanted to test and it passed.

joshuaehill commented 11 months ago

I have a test file named sample.txt with 1024 samples data as a string without new line Is this correct file format? or should I need to do any changes command to run test : ./ea_non_iid -i-v sample.txt 8 sample.txt: 0x4d0xb20x850x7a0x850x7d0xc20x3d0x820x7d0x820x7d0x820x3c0xc3..................0x7d0x980xe30xbc0xc30xbc0xe0x710x8e0x610xde0x00xf70x80x3e0x810.........................................................0xac0x130xee0x310x8e0x75

An text file containing samples in the format of text strings like "0x4d" is absolutely not the correct format.

Please understand, this is not an issue with the tool, but is instead a quite general issue regarding how binary files work on your platform. Please take a look at some binary file I/O introductions for whatever computer language you are most comfortable with. You want to produce and use "binary" files containing the stated octets, not "text" files.

joshuaehill commented 11 months ago

@joshuaehill Can we know the implementation of get_noise_sample()?

Please see the SP 800-90B document for context. This is the function used to abstract the noise source interface. The interface is entropy-source specific.

joshuaehill commented 11 months ago

Hello, I'm trying to understand NIST-SP800-90b, specifically the use of the parameter [bits_per_symbol] in ea_non_iid.

For the given binary_signature.txt, shall the [bits_per_symbol] be set as 64?

It is surely possible that your noise source produces 64-bit output, but if this is the case, you are going to need to map these outputs down to at most 8-bit-wide symbols.

Be advised that this mapping essentially establishes the probability of each mapped symbol by adding the probability of each symbols that maps to it. This may mask problems with the underlying noise source, so it should be done carefully.

However, I understand that [bits_per_symbol] should be small enough to fit within a single byte. Does this mean I should divide the original data into 8 segments, with each segment being 8 bits (therefore, bits_per_symbol = 8)?

No, this would cause the tool to produce nonsense.

I'm concerned that doing so might alter the original data's physical meaning.

Indeed.

There is some discussion on how to do this "mapping down" (i.e., "reducing the symbol space") in SP 800-90B Section 6.4, though this is only one possible approach. An alternate approach that applies in some physical systems is to use the approach that I outlined in Comment #20. Alternately, in some physical systems it makes more sense to map various value ranges to different abstract symbols (e.g., discretizing by mapping a sampled input voltage level to one of several distinguished symbols). The appropriate mapping approach is very specific to the noise source.