Why create new Hasher per invocation

OughtToPrevail commented 5 years ago

I am reading your code and for every public method invocation of HasherImpl you create a new HasherImpl which I have no clue why. Do you mind explaining?

k3mist commented 5 years ago

I don't have any relationship to this project except for using it. Looking at source to the impl it stores very sensitive data within the object.

So given the nature of what is contained within the object creating a new object for each invocation the usage of the previous object can be discarded via GC much more quickly leaving less footprint/room for successful attack vector.

OughtToPrevail commented 5 years ago

Exactly the opposite actually, when you create a new object matching the current one with new values and discard the current one it will wait for GC to delete the current one leaving now 2 instances of the sensitive data which grows every new instance.

k3mist commented 5 years ago

No matter, you have to wait for GC to discard objects unless you manually call GC (just the nature of the jvm). If you keep a reference to the HasherImpl it will never be discarded and possibly previous references may also never be discarded unless all previous references are set null.

Outside of that fact making a single reference to the impl would also make it not thread safe if the internal state is changing from external references (which is the only way you could update those internal values if there was only one reference)

The current impl is the most secure and is also thread safe.

To summarize, the current impl makes sure previous references are discarded after each operation is performed

kosprov commented 5 years ago

Hi all. First of all thanks for the comments and the interest.

HasherImpl and VerifierImpl hold references to sensitive data (e.g. the reference to byte[] password) and do not copy its content. Client code (i.e. your application) is responsible for clearing the byte[] when hashing/verification is completed.

Jargon2 even provides a convenience abstraction in the form of the ByteArray API to help client code deal with conversions from char[] to byte[] or from a String to byte[] etc, and having headaches with keeping track of sensitive data copies. It goes one step further by allowing you to have the source array cleared when ByteArray lifecycle ends.

In this snippet from the documentation:

char[] password = somehowGetPassword();
try (ByteArray passwordByteArray = toByteArray(password).clearSource()) {
    // use passwordByteArray with Hasher or Verifier
}
// Internal byte[] copy AND the source char[] (password variable) are zeroed-out here

any leftover HasherImpl and VerifierImpl objects that were created inside the try block simply point to empty arrays.

I remember it was one of my design goals to handle sensitive data responsibly and I was being extra careful in not making copies or wiping them internally after use. The only place I could not achieve this is with normalization where I could not find a library that does the same. This was clearly documented and normalization is optional, anyway.

So, with correct use, all these intermediate HasherImpl and VerifierImpl which you loose reference to (but are still in the heap until GC) point to byte[]s that are already wiped out.

Of course, bugs are possible and if you believe that the above is not always the case, write a client code snippet that you believe leaves unwiped sensitive data in the heap. I'll examine it as soon as possible.

kosprov commented 5 years ago

For the initial question "Why create new Hasher per invocation?":

I wanted to design and idiomatic fluent API that is simple and flat. It must be used in a certain way, though. Creating a new HasherImpl object on every API call is a result of this decision.

You can also check my long answer to https://github.com/kosprov/jargon2-api/issues/2

In any case, creating a new HasherImpl object and copying references is extremely cheap and to put into some perspective: You're about to allocate a huge amount of RAM to do CPU intensive calculations. The cost of 2 or 3 additional objects is negligible in terms of both RAM and CPU.

OughtToPrevail commented 5 years ago

@k3mist I am well aware of how the Garbage Collector works, also no it doesn't after a hash it does not set the password to null (tho I think you mean if they don't create a variable for it, it would discard it when the object is discarded). Also thread-safety can probably be better achieved with synchronized. @kosprov Thank you for your detailed answer and for making such a well made project, sadly I still don't understand why you need it instead of using return this; but until you change it I'll continue and use: this code I made to use the backend instead of the hasher which sadly requires me to also add a dependency to the jargon2-api for Type and Version. This also has the advantage that a new Jargon2Backend isn't created every hash.

kosprov commented 5 years ago

If you make proper use of the API and you write a component like the PasswordHasher CDI bean described in the README file, when you call its public methods PasswordHasher#encodedHash(char[]) or PasswordHasher#verifyEncoded(String, char[]) the password char[] is wiped out. The char[] will be automatically filled with zeroes before those methods end and there will be no sign of the user password any more. Period.

PasswordHasher#encodedHash(char[]) creates 1 HasherImpl copy and PasswordHasher#verifyEncoded(String, char[]) creates 2 VerifierImpl copies. That's less than 1KB of garbage. Argon2 will allocate 5 or 6 orders of magnitude more RAM than this "unecessary" 1KB.

... sadly I still don't understand why you need it instead of using return this;

If you want to understand why a new copy of the HasherImpl or VerifierImpl must be created, read my analysis on https://github.com/kosprov/jargon2-api/issues/2. Just trust me that return this is gonna give wrong results.

but until you change it I'll continue and use...

Instead of using NativeRiJargon2Backend directly, I suggest you use the low-level API. You still get the benefit of a decoupled backend.

This also has the advantage that a new Jargon2Backend isn't created every hash.

There is always 1 and only 1 instance of an automatically discovered backend. No new backend instances are created when HasherImpl or VerifierImpl copies are created.

kosprov / jargon2-api

Why create new Hasher per invocation #3