simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.08k stars 185 forks source link

move byte histogram from scan_aes to sbuf_t #293

Closed simsong closed 2 years ago

simsong commented 2 years ago

Since scan_aes is computing a byte histogram, we might as well compute it when scanner_params is computed make it available to all of the scanners. However, now that we are calling each scanner in its own thread, and we are constructing scanner_params each time through, we may need to rethink the way that the work pool is working.

jonstewart commented 2 years ago

I don’t entirely follow why std::shared_ptr would be a problem. Would any of the threads change the state of the object they’re managing? If the object is constant, then all the reference counting machinery is thread safe.

Can things be structured where the object (the byte histogram?) isn’t changed in a multithreaded context? That would be a simpler, safer design and reduce locking.

On Nov 24, 2021, at 3:49 AM, Simson L. Garfinkel @.***> wrote:

 Since scan_aes is computing a byte histogram, we might as well compute it when scanner_params is computed make it available to all of the scanners. However, now that we are calling each scanner in its own thread, and we are constructing scanner_params each time through, we may need to rethink the way that the work pool is working.

Change the work pool so that instead of list of sbufs, it is a list of work units, with shared sbufs and shared scanner params and sbuf_infos. Gosh. Can we use C++'s shared_ptr? It turns out that shared_ptr sare not thread safe. https://www.modernescpp.com/index.php/atomic-smart-pointers. Perhaps we should just put this inside the sbuf and make it lazy evaluation like we do the other sbuf lazy things. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

simsong commented 2 years ago

std::shared_ptr is not threadsafe, so it would require a lock. But there is no place to put the lock.

sbuf's are threadsafe and has a lock for the ngram function. So I'm just going to use that same mechanism to keep track of the histogram. Lazy evaluation protected by a mutex. It is easy to implement and I have a working implementation.

Previously I used std::shared_ptr and std::unique_ptr and AddressSanitizer periodically complained. I couldn't figure out why. Now I know that it was a threading issue. (AddressSanitizer and ThreadSanitizer conflict. I'm going to set up the self-tests to run both in order.)

simsong commented 2 years ago

This code was already taken out of scan_aes. See #296.