Interface for KMAC - Githubissues

falko-strenzke commented 1 year ago

For a project we currently plan to contribute KMAC256 to Botan.

KMAC's interface doesn't exactly fit to Botan's MAC interface, so I would like to discuss the best approach.

KMAC has the following parameters:

K is a key bit string of any length7, including zero.
X is the main input bit string. It may be of any length, including zero.
L is an integer representing the requested output length8 in bits.
S is an optional customization bit string of any length, including zero. If no customization is desired, S is set to the empty string.

Only L and S need discussion. For S, I think the optional nonce in Botan's MAC interface fits well. But for the output length L I don't see how to fit it into Botan's interface. Theoretically, it could be given in the algorithm string as a parameter. But I was never a fan of Botan's approach to use strings for the identification of algorithms, and using this mechanism to encode numerical parameters goes even one step further. I would prefer to avoid that here.

I would instead propose to add an optional parameter to start_msg:

virtual void start_msg(const uint8_t nonce[], size_t nonce_len, output_length = 0);

Or would the default value in a virtual function be undesired? (It would have to be set again in the same way by each derived class, then things should be fine in my understanding).

Otherwise we could add a further function to the MAC interface to set the output length. The output length needs to be known to KMAC only before finalizing the computation. Thus this function could be called at any point before finalizing the computation.

reneme commented 1 year ago

Interesting! I agree that we should figure out some consistent API for that.

For what its worth: The approach via the algorithm identifier string is already used for SHAKE. E.g. Hash::create("SHAKE-128(256)") gives you a SHAKE hash with 32 bytes of output. So there's already precedence for doing it like that. (Not saying that I like it very much either, though.) But maybe the Hash_Function class is another dimension in this discussion.

Slightly off-topic, but to explore the design space of such an API addition: Maybe a configurable output length might be useful for conventional hashes and MACs for output truncation. I'm currently implementing the additional XMSS parameter sets proposed by NIST that rely on truncated hashes and such a feature would be helpful there. Just thinking out loud here.

randombit commented 1 year ago

There is plenty of precedent for encoding output lengths in the param string, eg SHA-3, Skein-512, BLAKE2b all use this approach.

I'm open to alternatives but there are a number of somewhat tricky design questions to resolve here.

auto kmac = ...;
kmac->start_msg(nullptr, 0); // no length - what length does KMAC produce?
size_t ol = kmac->output_length(); // what value is this??

kmac->start_msg(nullptr, 0, 32); // explicit length
assert(kmac->output_length(), 32); // this at least makes sense

kmac->start_msg(nullptr, 0); // no length given again - do we have to carry the output length as state??

An application would additionally need some means of querying which output lengths were valid for any particular object. Along with, I guess, a default value.

Also TBH I have to question the utility of being able to vary the output length from message to message - does anyone need this? It seems like it adds significant complexity to the API for a very marginal use case. I can see this paramaterization being more useful in an XOF-like approach, and I'd be fine with us adding some explicit XOF interface that could accommodate KMAC as well. (Currently we fake XOFs as a StreamCipher which is sufficient but awkward)

@reneme if I may suggest an approach here - create a new hash Truncated which is defined in terms of some other hash function, and simply truncates the result. Eg Truncated(SHA-256, 192) == SHA-256/192. Analogous to the existing Parallel hash.

falko-strenzke commented 1 year ago

I agree that setting the output length during instantiation is the natural approach. And then probably we should stick to the established practice of encoding it into the algorithm specification string.

Maybe at some point it could be considered to refactor the interface to build the algorithm and parameter specification also using C++ types. Currently, in our case here, if one needs to determine the output length at run-time, one would need to write code to encode the numeric value into a string, which is not very convincing to me. Even then, the string-based instantiation could be kept for backwards compatibility. Just an idea.

reneme commented 1 year ago

if I may suggest an approach here - create a new hash Truncated [...]

Nice! I didn't know about those "virtual" hash implementations. Sounds like a shoo-in to me. Thanks.

randombit commented 1 year ago

@falko-strenzke I do not disagree that the string based approach was probably a mistake! But it is imo better to have a consistent, if non-ideal, API, than it is to have a multitude of different approaches to solving an identical problem. And the API is complicated enough :sob: let us not add to it unless we really need to.

falko-strenzke commented 1 year ago

@randombit I understand, but just to make clear what I imagine and that I don't think it adds that much complexity.

Create a framework for types that allow composing algorithm specs in a nice C++ operator syntax, e.g. auto alg_spec = HMAC256 / Truncate_Output(Bits(128));
Let the existing functions for algorithm instantiations use that new framework
Provide mechanism that maps the existing string definitions to the new framework for each module.

I wouldn't even say that the strings are necessarily a clear mistake since they ease the forwarding of algorithm specifications coming from human readable sources. For instance in the Botan test framework. I just think that considering them one of two ways of expressing algorithm specifications would be an improvement. But in any case, I am not saying I have a plan to do this, I just want to outline this approach as I think it would be a feasible addition at some point in the future.

reneme commented 1 year ago

I also find @falko-strenzke's idea worth exploring, actually. Though, I'm personally not a huge fan of "misusing" C++ operators for building such a DSL (even though it seems to be in fashion). IMHO it forces library users to learn the semantics of the operators before being able to make sense of the code. Which is particularly bad for auditors and code reviewers.

Already with the existing API, you could write something like that:

std::make_unique<HMAC>(std::make_unique<SHAKE_128>(128));

For truncation, one could have adapters (like the Parallel hash function @randombit already mentioned):

std::make_unique<Truncation<Message_Authentication_Code>>(32, 
   std::make_unique<HMAC>(
      std::make_unique<SHAKE_128>(128));

All of this is of course much more boiler-plate (particularly the std::make_unique<> is unfortunate). But it's usable without adding any mechanics to the library (except the Truncation adapter, of course).

This would mean a departure from the abundant ::create(std::string) factory methods, of course.

falko-strenzke commented 1 year ago

I documented my idea regarding a type-based interface for algorithm instantiation now in #3275.

randombit / botan

Interface for KMAC #3262