usnistgov / SP800-90B_EntropyAssessment

The SP800-90B_EntropyAssessment C++package implements the min-entropy assessment methods included in Special Publication 800-90B.
202 stars 87 forks source link

Read bits instead of bytes #193

Open woodbe opened 2 years ago

woodbe commented 2 years ago

I have run many times into situations where the output file that is provided for use is a straight-up bitstream of output from the source. So instead of each byte having the least significant bit as the one to read, that byte would actually contain 8 individual bits from the source. Currently to properly read everything I then need to run the output through a conversion into bytes so they can be properly read.

Since the file is binary anyway, is there a reason I can't have the program consume the bit output directly? I would expect this to just be another switch (or actually letting "1" be valid). This has been an issue for some time, and I am wondering if this could be resolved relatively simply here instead of having to rely on an external program to massage the data into an acceptable format.

joshuaehill commented 2 years ago

The current approach is inefficient, but very explicit with respect to ordering. If packed binary was accepted by the tool, then there would be a need to specify the ordering convention for the input bytes (that is, was the highest order bit produced first or the lowest order bit?)

woodbe commented 2 years ago

I would agree, though in my case what I am looking at as output doesn't have any ordering, it is literally each 1 or 0 is one output from the source, so 1 byte is 8 actual output bits (as opposed to say 4 hex or something else). I completely agree though on needing explicit selections or requirements on the input format, but my use case is probably about as simple as it can be. If I as for 1 million samples, I get a 125kb file (with 1 million bits). So while I agree ordering can be very important, in the scenario I have this isn't directly relevant since each sample is only 1 bit, and each byte is 8 sample bits.

joshuaehill commented 2 years ago

But in what ordering were the bits in each byte produced by the noise source?

woodbe commented 2 years ago

From my understanding in this case, it is continually appending each new bit. I can see what you mean about different options (prepending vs appending, building the byte from the least bit and then appending the whole byte, etc), but in my particular use case, it is a simple append of the output to the file until the request size has been met.