terl / lazysodium-java

A Java implementation of the Libsodium crypto library. For the lazy dev.
https://github.com/terl/lazysodium-java/wiki
Mozilla Public License 2.0
134 stars 46 forks source link

Support passing byte arrays to lazy methods instead of strings #109

Open Traderjoe95 opened 3 years ago

Traderjoe95 commented 3 years ago

Using plain strings as input to hash, encryption, etc. functions doesn't work very well for binary data. I have experienced several issues where decoding the data to a string before encrypting it would lose some bytes because they weren't valid in the respective encoding.

I would suggest to

  1. supply another batch of lazy methods for GenericHash, SecretBox, AEAD and others that take byte arrays as their respective messages directly. They may return either a hex string or a byte array, both are good options, really.
  2. clarify the API regarding what type of string is expected as an input. At the moment, the usage of String as the parameter type is ambiguous as sometimes it means "hex string" and sometimes it means "plain string". Maybe this can be clarified through the parameter name, e.g. plainMessage vs hexCipher
timmc commented 2 years ago

This has been one of my main issues with the library as well. There are a number of places where I have to use the Native interfaces and deal with output parameters and array math just because I want to use byte-array inputs and outputs.

I would advocate for having any new methods on the Lazy side always deal in bytes in and out, since hex encoding/decoding can be done by the caller as needed. (Not everyone deals in hex strings for encrypted data, and the hex-vs-plain API confusion you mention is a serious one.)

enote-kane commented 2 years ago

The fact that there are no "lazy" methods taking byte[] is not the worst here, apparently.

Let's take a quick look into String cryptoGenericHash(String in, Key key):

public String cryptoGenericHash(String in, Key key) throws SodiumException {
    byte[] message = bytes(in); // (1)
    byte[] keyBytes = key.getAsBytes();

    byte[] hash = randomBytesBuf(GenericHash.BYTES); // (2)
    boolean res = cryptoGenericHash(hash, hash.length, message, message.length, keyBytes, keyBytes.length);

    if (!res) {
        throw new SodiumException("Could not hash the message.");
    }

    return messageEncoder.encode(hash); // (3)
}
  1. create a copy of the input Stringbyte[]
    • data copy not cleaned up/zero-filled before leaving the function (left for GC)
    • gets bytes via charset (which defaults to UTF-8)
  2. initialize a byte[] for the output hash
    • unnecessarily filled with random - entropy pressure (Java byte[] is always zero-filled, demanded by the spec)
  3. create a String from the internal byte[]
    • encodes result into a Hex string by default
    • does not clean up/zero-fill internal byte[] copy of the hash before leaving the function

So, at least we have:

I really don't get the point of returning a hash as a string.

Nevertheless, a function like this should be added to make the app developers happy:

public byte[] cryptoGenericHash(byte[] message, Key key) throws SodiumException {
    byte[] keyBytes = key.getAsBytes();

    byte[] hash = new byte[GenericHash.BYTES];
    boolean res = cryptoGenericHash(hash, hash.length, message, message.length, keyBytes, keyBytes.length);

    if (!res) {
        throw new SodiumException("Could not hash the message.");
    }

    return hash;
}

...which:

Obviously, this project wants to avoid a dependency like Apache Commons Codecs at all costs. Otherwise, encoding/decoding of data could have been provided in a configurable way without having to maintain re-implementations.

Here, I would have expected to also make use of the libsodium helpers (which even provides nice options for Base64, since different project have different needs here).

Also, creation of the output/return String is inconsistent, at least String cryptoShortHash(String in, Key key) does not use the configured MessageEncoder and always returns hex, no matter what.

After some code review, I've decided to ditch "lazy" and only allow .Native interfaces in our code.