usnistgov / SP800-90B_EntropyAssessment

The SP800-90B_EntropyAssessment C++package implements the min-entropy assessment methods included in Special Publication 800-90B.
195 stars 88 forks source link

What content of the input file? #138

Closed laogaolao closed 4 years ago

laogaolao commented 4 years ago

hi,

For the input file, what content it should be? A string with ASCII '0' and '1' or a binary file with data collect directly from noise source? Thanks

joshuaehill commented 4 years ago

The file format is (somewhat) described in the README file.

The data fed to this tool should be encoded as 1 symbol per byte. For binary data, the input file would normally be a string of bytes, where each byte in the sample is either 0x00 or 0x01. Noise sources that produce 256 symbols or fewer can also have their raw noise samples be encoded as bytes (one symbol per byte). If your raw data source produces more than 256 distinct symbols, then you will need to map down each raw data symbol (reduce the symbol space) so that the mapped data fits into one byte per mapped symbol. (See SP800-90B Section 6.4.)

As a reminder, this tool will only produce lower bound entropy estimates when testing the raw data. Testing conditioned data will provide only an upper bound for the entropy.

laogaolao commented 4 years ago

Thank you for your reply, let me try to understand this. For example, I used the openssl tool to generate a lot of random number, like this: 2R5kRvn71KCJVHiowEfVS+r5pNJR1cAxZ8Lj3/od9wTwXmLdHReZvG0PgvbR8q8b....

if I want to use this tool to check the iid or non-iid, I need to transfer the random number to a string with '0' and '1' as input, like this: 00110010010100100011010101101011010100100111011001101110001101110011000101001011010000110100

Is that right?

joshuaehill commented 4 years ago

I'm not aware that openssl has an interface that exposes data that would be appropriate for analysis using this tool, either as raw data or as conditioned data.

If we assumed that you had altered the existing library so that its output could count as the output of a conditioning function, then you should probably produce the output, first ordered blockwise, and then using some consistent bit ordering convention within a block.

Most of the openssl inputs are themselves conditioned, so nothing that the stock openssl has access to would be reasonable to think of as "raw data". If we assumed that you changed this, then the appropriate block size would entirely depend on the particular source in use.

But again, the default openssl doesn't produce output that would be appropriate for analysis, either as a raw noise source, or as a non-vetted conditioning function.

This tool is an implementation of the SP800-90B tests, so you should refer to that document for a description of the sort of testing that this tool is intended to accomplish and the sort of data that should be provided to this tool.

laogaolao commented 4 years ago

I understand, thank you very much.

joshuaehill commented 4 years ago

If, in your view, this doesn't require a change to the code / documentation set, please close this issue. Thanks!