Closed OughtToPrevail closed 5 years ago
I don't have any relationship to this project except for using it. Looking at source to the impl it stores very sensitive data within the object.
So given the nature of what is contained within the object creating a new object for each invocation the usage of the previous object can be discarded via GC much more quickly leaving less footprint/room for successful attack vector.
Exactly the opposite actually, when you create a new object matching the current one with new values and discard the current one it will wait for GC to delete the current one leaving now 2 instances of the sensitive data which grows every new instance.
No matter, you have to wait for GC to discard objects unless you manually call GC (just the nature of the jvm). If you keep a reference to the HasherImpl it will never be discarded and possibly previous references may also never be discarded unless all previous references are set null.
Outside of that fact making a single reference to the impl would also make it not thread safe if the internal state is changing from external references (which is the only way you could update those internal values if there was only one reference)
The current impl is the most secure and is also thread safe.
To summarize, the current impl makes sure previous references are discarded after each operation is performed
Hi all. First of all thanks for the comments and the interest.
HasherImpl
and VerifierImpl
hold references to sensitive data (e.g. the reference to byte[] password
) and do not copy its content. Client code (i.e. your application) is responsible for clearing the byte[]
when hashing/verification is completed.
Jargon2 even provides a convenience abstraction in the form of the ByteArray
API to help client code deal with conversions from char[]
to byte[]
or from a String
to byte[]
etc, and having headaches with keeping track of sensitive data copies. It goes one step further by allowing you to have the source array cleared when ByteArray
lifecycle ends.
In this snippet from the documentation:
char[] password = somehowGetPassword();
try (ByteArray passwordByteArray = toByteArray(password).clearSource()) {
// use passwordByteArray with Hasher or Verifier
}
// Internal byte[] copy AND the source char[] (password variable) are zeroed-out here
any leftover HasherImpl
and VerifierImpl
objects that were created inside the try
block simply point to empty arrays.
I remember it was one of my design goals to handle sensitive data responsibly and I was being extra careful in not making copies or wiping them internally after use. The only place I could not achieve this is with normalization where I could not find a library that does the same. This was clearly documented and normalization is optional, anyway.
So, with correct use, all these intermediate HasherImpl
and VerifierImpl
which you loose reference to (but are still in the heap until GC) point to byte[]
s that are already wiped out.
Of course, bugs are possible and if you believe that the above is not always the case, write a client code snippet that you believe leaves unwiped sensitive data in the heap. I'll examine it as soon as possible.
For the initial question "Why create new Hasher per invocation?":
I wanted to design and idiomatic fluent API that is simple and flat. It must be used in a certain way, though. Creating a new HasherImpl
object on every API call is a result of this decision.
You can also check my long answer to https://github.com/kosprov/jargon2-api/issues/2
In any case, creating a new HasherImpl
object and copying references is extremely cheap and to put into some perspective: You're about to allocate a huge amount of RAM to do CPU intensive calculations. The cost of 2 or 3 additional objects is negligible in terms of both RAM and CPU.
@k3mist I am well aware of how the Garbage Collector works, also no it doesn't after a hash it does not set the password to null
(tho I think you mean if they don't create a variable for it, it would discard it when the object is discarded). Also thread-safety can probably be better achieved with synchronized
.
@kosprov Thank you for your detailed answer and for making such a well made project, sadly I still don't understand why you need it instead of using return this;
but until you change it I'll continue and use: this code I made to use the backend instead of the hasher which sadly requires me to also add a dependency to the jargon2-api for Type and Version.
This also has the advantage that a new Jargon2Backend isn't created every hash.
If you make proper use of the API and you write a component like the PasswordHasher
CDI bean described in the README file, when you call its public methods PasswordHasher#encodedHash(char[])
or PasswordHasher#verifyEncoded(String, char[])
the password char[]
is wiped out. The char[]
will be automatically filled with zeroes before those methods end and there will be no sign of the user password any more. Period.
PasswordHasher#encodedHash(char[])
creates 1 HasherImpl
copy and PasswordHasher#verifyEncoded(String, char[])
creates 2 VerifierImpl
copies. That's less than 1KB of garbage. Argon2 will allocate 5 or 6 orders of magnitude more RAM than this "unecessary" 1KB.
... sadly I still don't understand why you need it instead of using
return this;
If you want to understand why a new copy of the HasherImpl
or VerifierImpl
must be created, read my analysis on https://github.com/kosprov/jargon2-api/issues/2. Just trust me that return this
is gonna give wrong results.
but until you change it I'll continue and use...
Instead of using NativeRiJargon2Backend
directly, I suggest you use the low-level API. You still get the benefit of a decoupled backend.
This also has the advantage that a new Jargon2Backend isn't created every hash.
There is always 1 and only 1 instance of an automatically discovered backend. No new backend instances are created when HasherImpl
or VerifierImpl
copies are created.
I am reading your code and for every public method invocation of HasherImpl you create a new HasherImpl which I have no clue why. Do you mind explaining?