usnistgov / SP800-90B_EntropyAssessment

The SP800-90B_EntropyAssessment C++package implements the min-entropy assessment methods included in Special Publication 800-90B.
195 stars 88 forks source link

How much data should be fed into entropy assessment tools for an accurate min-entropy estimate? #223

Open maxwell-pung-bsi opened 1 year ago

maxwell-pung-bsi commented 1 year ago

I have been experimenting with using the NIST 800-90B entropy assessment tools to quantify the min-entropy provided by black-box HRNGs.

The repositories documentation (as far as I can tell) does not provide guidance as to how much data should be fed into the tools in order to get an accurate min-entropy estimate. In the vendor documentation of one of the HRNGs, they claim that the data files passed to the entropy assessment tools should be at least 10MB in size in order to get an accurate result. But, I have no idea where they got this number from.

Is there guidelines for how much data should be passed into the entropy assessment tools to get an accurate min-entropy estimate?

joshuaehill commented 1 year ago

This tool is an implementation of the estimators in NIST SP 800-90B. It is difficult to interpret the meaning of the output of these tools without reading this document in order to get some idea of what this tool is intended to accomplish.

This document (in Section 3.1.1) specifies that the sample size ($L$) should be at least 1 million. It also requires that this data sample must be "raw" output of the noise source (roughly equivalent in AIS-31 terms to somewhere in the range between "das random numbers" and "raw random numbers").

Most of the estimators include some sort of confidence interval calculation, whose width varies proportionally to roughly $1 \over \sqrt{L}$, so (barring some observed defect) using larger samples is likely to result in results that are both numerically larger and more stable across independent tests.

This tool cannot (indeed, it is not theoretically possible for a tool to) reliably estimate the min entropy for all noise sources. For example, imagine statistically assessing almost any reasonable PRNG.

In SP 800-90B, any estimate for min entropy must be based on an understanding of the system producing the numbers (i.e., black box entropy estimation isn't, in general, possible.) This design-based assessment is integrated as the $H_\text{submitter}$ value.