Closed falko-strenzke closed 1 year ago
Interesting! I agree that we should figure out some consistent API for that.
For what its worth: The approach via the algorithm identifier string is already used for SHAKE
. E.g. Hash::create("SHAKE-128(256)")
gives you a SHAKE hash with 32 bytes of output. So there's already precedence for doing it like that. (Not saying that I like it very much either, though.) But maybe the Hash_Function
class is another dimension in this discussion.
Slightly off-topic, but to explore the design space of such an API addition: Maybe a configurable output length might be useful for conventional hashes and MACs for output truncation. I'm currently implementing the additional XMSS parameter sets proposed by NIST that rely on truncated hashes and such a feature would be helpful there. Just thinking out loud here.
There is plenty of precedent for encoding output lengths in the param string, eg SHA-3, Skein-512, BLAKE2b all use this approach.
I'm open to alternatives but there are a number of somewhat tricky design questions to resolve here.
auto kmac = ...;
kmac->start_msg(nullptr, 0); // no length - what length does KMAC produce?
size_t ol = kmac->output_length(); // what value is this??
kmac->start_msg(nullptr, 0, 32); // explicit length
assert(kmac->output_length(), 32); // this at least makes sense
kmac->start_msg(nullptr, 0); // no length given again - do we have to carry the output length as state??
An application would additionally need some means of querying which output lengths were valid for any particular object. Along with, I guess, a default value.
Also TBH I have to question the utility of being able to vary the output length from message to message - does anyone need this? It seems like it adds significant complexity to the API for a very marginal use case. I can see this paramaterization being more useful in an XOF-like approach, and I'd be fine with us adding some explicit XOF interface that could accommodate KMAC as well. (Currently we fake XOFs as a StreamCipher
which is sufficient but awkward)
@reneme if I may suggest an approach here - create a new hash Truncated
which is defined in terms of some other hash function, and simply truncates the result. Eg Truncated(SHA-256, 192)
== SHA-256/192
. Analogous to the existing Parallel
hash.
I agree that setting the output length during instantiation is the natural approach. And then probably we should stick to the established practice of encoding it into the algorithm specification string.
Maybe at some point it could be considered to refactor the interface to build the algorithm and parameter specification also using C++ types. Currently, in our case here, if one needs to determine the output length at run-time, one would need to write code to encode the numeric value into a string, which is not very convincing to me. Even then, the string-based instantiation could be kept for backwards compatibility. Just an idea.
if I may suggest an approach here - create a new hash
Truncated
[...]
Nice! I didn't know about those "virtual" hash implementations. Sounds like a shoo-in to me. Thanks.
@falko-strenzke I do not disagree that the string based approach was probably a mistake! But it is imo better to have a consistent, if non-ideal, API, than it is to have a multitude of different approaches to solving an identical problem. And the API is complicated enough :sob: let us not add to it unless we really need to.
@randombit I understand, but just to make clear what I imagine and that I don't think it adds that much complexity.
auto alg_spec = HMAC256 / Truncate_Output(Bits(128));
I wouldn't even say that the strings are necessarily a clear mistake since they ease the forwarding of algorithm specifications coming from human readable sources. For instance in the Botan test framework. I just think that considering them one of two ways of expressing algorithm specifications would be an improvement. But in any case, I am not saying I have a plan to do this, I just want to outline this approach as I think it would be a feasible addition at some point in the future.
I also find @falko-strenzke's idea worth exploring, actually. Though, I'm personally not a huge fan of "misusing" C++ operators for building such a DSL (even though it seems to be in fashion). IMHO it forces library users to learn the semantics of the operators before being able to make sense of the code. Which is particularly bad for auditors and code reviewers.
Already with the existing API, you could write something like that:
std::make_unique<HMAC>(std::make_unique<SHAKE_128>(128));
For truncation, one could have adapters (like the Parallel
hash function @randombit already mentioned):
std::make_unique<Truncation<Message_Authentication_Code>>(32,
std::make_unique<HMAC>(
std::make_unique<SHAKE_128>(128));
All of this is of course much more boiler-plate (particularly the std::make_unique<>
is unfortunate). But it's usable without adding any mechanics to the library (except the Truncation
adapter, of course).
This would mean a departure from the abundant ::create(std::string)
factory methods, of course.
I documented my idea regarding a type-based interface for algorithm instantiation now in #3275.
For a project we currently plan to contribute KMAC256 to Botan.
KMAC's interface doesn't exactly fit to Botan's MAC interface, so I would like to discuss the best approach.
KMAC has the following parameters:
Only L and S need discussion. For S, I think the optional nonce in Botan's MAC interface fits well. But for the output length L I don't see how to fit it into Botan's interface. Theoretically, it could be given in the algorithm string as a parameter. But I was never a fan of Botan's approach to use strings for the identification of algorithms, and using this mechanism to encode numerical parameters goes even one step further. I would prefer to avoid that here.
I would instead propose to add an optional parameter to start_msg:
Or would the default value in a virtual function be undesired? (It would have to be set again in the same way by each derived class, then things should be fine in my understanding).
Otherwise we could add a further function to the MAC interface to set the output length. The output length needs to be known to KMAC only before finalizing the computation. Thus this function could be called at any point before finalizing the computation.