usnistgov / SP800-90B_EntropyAssessment

The SP800-90B_EntropyAssessment C++package implements the min-entropy assessment methods included in Special Publication 800-90B.
195 stars 88 forks source link

Add support for large files. (> 2GB) #226

Closed joshuaehill closed 10 months ago

joshuaehill commented 1 year ago

This is an update of PR #217 that I had to recreate because git automatically closed the prior PR when I forced my testing repository to be synchronized with the NIST repository. (I'm sure that I messed it up somehow.)

These are the same "large file support" changes as in PR #217.

This is mainly useful because it allows for more samples of wide data; this comes up most frequently with the bitstring tests. For 8-bit data, the prior code would fail when the dataset was larger than about 256MB. This is commonly relevant when doing the statistical assessment for non-vetted conditioning functions.

Both -ldivsufsort64 and -ldivsufsort libraries are required when linking. The tool opportunistically uses the 32-bit-index version of the tool when it can, because the 32-bit-index version of the library (which for an n-symbol-length string requires 13n bytes of memory) uses approximately half the memory of the 64-bit-index version (which uses 25n bytes of memory for the same task).

celic commented 10 months ago

The plan is to merge this in after we can do some testing locally on the changes. This changes the build process slightly. We use the GitHub code for our testing on ESVTS, though we won't be using the large file support on that platform.

joshuaehill commented 10 months ago

The changes look large, but the bulk of them are exactly the same logic as the 32-bit index code, but using different types. (I suppose using C++ Templates would be more logically clean, but C++ Templates make my skin crawl). In any case, let me know if you have any questions.