Closed pwinckles closed 3 years ago
Most of the time it doesn't matter as you don't get a performance increase because you are limited by disk IO and multi-threading won't help you there. Are you reading files from multiple disks?
It does matter because it means that an application cannot construct multiple different bags concurrently because all of the bytes are going through a single MessageDigest
instance.
To be clear, this is not a performance consideration at all. As it stands, the library is not safe to use to create or validate bags in a multi-threaded application apart from creating a synchonized wrapper around it.
I would be happy to submit a PR to address the problem, if you'd like.
Pull requests are welcome, but they will not be merged in until they meet my quality requirements.
In my opinion, the following changes should be made:
HasherFactory
, should be introduced that has a single method, createHasher()
BagitChecksumNameMapping
should be changed to hold HasherFactories
DefaultHasher
class should be introduced that implements Hasher
and includes the hashing logic from StandardHasher
StandardHasher
enum should be changed to implement HasherFactory
and return new DefaultHasher
instancesBagitChecksumNameMapping.get()
is called, the HasherFactory
is retrieved from the map and createHasher()
is called on it and returnedThis will allow you to reuse the hasher as you are currently doing, but every caller of BagitChecksumNameMapping.get()
will have their own instance, so there will be no concurrency issues.
I would also look at if it makes more sense and to stop having the hashers be a singleton. The main idea was to have them immutable, but it looks like it is causing more trouble than it is worth.
Should be fixed in version 4.4
Bagging uses an enum to represent hashers that are used to compute the digests of files that are added to bags. The problem is that since it's an enum the same hasher instance is used globally, which means that it will not compute the correct digest if multiple bags are created concurrently.
The following code demonstrates the problem: