Determine any cross-implementation API requirements

mkannwischer commented 1 month ago

I propose that we use the following common KEM API (following what's currently in the ML-KEM reference implementation):

int pqcp_mlkem512_ref_keypair_derand(uint8_t *pk, uint8_t *sk, const uint8_t *coins);
int pqcp_mlkem512_ref_keypair(uint8_t *pk, uint8_t *sk);
int pqcp_mlkem512_ref_enc_derand(uint8_t *ct, uint8_t *ss, const uint8_t *pk, const uint8_t *coins);
int pqcp_mlkem512_ref_enc(uint8_t *ct, uint8_t *ss, const uint8_t *pk);
int pqcp_mlkem512_ref_dec(uint8_t *ss, const uint8_t *ct, const uint8_t *sk);

The namespacing can be dropped for languages where it's done differently. mlkem512 is to be replaced by the corresponding parameter set and ref can be replaced by an identifier of the implementation. This follows what NIST wants to see: https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/Mf2kemwwreY/m/KArjoIhxAQAJ. The keypair and enc wrappers basically just fill the coins with randombytes() (64 bytes for keypair, 32 bytes for enc) and then call the derandomized versions.

(I'm assuming here that NIST is not going to change their mind about having those de-randomized APIs. I personally don't think there is a technical need for these because replacing the randombytes() implementation with a deterministic one for testing works just as well as Daniel J. Bernstein argues in the thread. We've had good experience with that approach in the PQClean and also pqm4.)

This API is already being used by https://github.com/pq-code-package/mlkem-c-embedded and https://github.com/pq-code-package/mlkem-c-aarch64 as those are based on the reference implementations.

For the saving code size, it may be useful for some target platforms to have dynamic parameter selection. In that case, I would additionally allow an API with an additional parameter selecting the parameter set. Something like this would work:

enum {
    MLKEM512  = 0,
    MLKEM768  = 1,
    MLKEM1024 = 2
};

int pqcp_mlkem_ref_keypair_derand(uint8_t *pk, uint8_t *sk, const uint8_t *coins, int param);
int pqcp_mlkem_ref_keypair(uint8_t *pk, uint8_t *sk, int param);
int pqcp_mlkem_ref_enc_derand(uint8_t *ct, uint8_t *ss, const uint8_t *pk, const uint8_t *coins, int param);
int pqcp_mlkem_ref_enc(uint8_t *ct, uint8_t *ss, const uint8_t *pk, int param);
int pqcp_mlkem_ref_dec(uint8_t *ss, const uint8_t *ct, const uint8_t *sk, int param);

tfaoliveira commented 1 month ago

I agree with Matthias. In addition to that, I propose to use extended names for the function arguments. I think it improves usability and reduces the chances of misusage.

int pqcp_mlkem768_keypair_derand(
  uint8_t *public_key,
  uint8_t *secret_key,
  const uint8_t *keypair_coins
);

int pqcp_mlkem768_keypair(
  uint8_t *public_key,
  uint8_t *secret_key
);

int pqcp_mlkem768_enc_derand(
  uint8_t *ciphertext,
  uint8_t *shared_secret,
  const uint8_t *public_key,
  const uint8_t *enc_coins
);

int pqcp_mlkem768_enc(
  uint8_t *ciphertext,
  uint8_t *shared_secret,
  const uint8_t *public_key
);

int pqcp_mlkem768_dec(
  uint8_t *shared_secret,
  const uint8_t *ciphertext,
  const uint8_t *secret_key
);

Regarding corresponding header files here is what is currently in use for libjade: api.h

It is similar to other projects as well.

Regarding namespacing and code organization: should we include, for instance, ref, avx2, amd64/x86_64/x64, or armv7m in the function names to identify, in more detail, the function and avoid misusages?

Currently, in libjade, the name of each exposed function is the full implementation path — an example here.

In the context of amd64, with which I'm most familiar, I think it is important that each implementation is distributed with an extra file stating which CPU extensions the implementation requires (if applicable: Jasmin/assembly implementations). For example, Keccak permutation using the andn instruction (defined in BMI1 extension) can be significantly faster (up to 20% IIRC). I think some CPU with support for AVX2 do not have BMI1. In any case, it might be excessive to include the full set of CPU extensions that an implementation requires in the corresponding function names (note: current mlkem768 requires BMI1 as it uses the non-AVX2 keccak permutation as well).

Regarding derandomized API's, they have the advantage of greatly simplifying known answer tests:

Some of these tests: https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/aCAX-2QrUFw/m/hy5gwcESAAAJ
Were integrated here, using derand functions: https://github.com/formosa-crypto/libjade/blob/main/test/crypto_kem/test_vectors.c

planetf1 commented 1 month ago

I'm a newbie to MK-KEM so forgive any nonsensical input

Staying close to the standard/reference implementation make sense..

internal vs derand

NIST uses _internal for the derand functions since they are geared to testing rather than consumer use. This seems to make the scope/usage clearer.

In the proposal here they are 'first class'. Is this because we expect that in practice those will be used by consumers? is this because we'd seen the need to do this in NSS?
- which implementation/namespace
The current functions are prefixed with pqcp_mlkem512ref

I can see this was done for
- our project (pqcp)
- what algorithm (mlkem512)
- which implementation (ref)
Given this

We will have a number of 'ref' implementations varying (I assume) by the nature of their optimizations (and in pqcp we have embedded and aarch64)

How much of this should be exposed at the API. Any change between implementation will require code changes from the consumer, so are no longer 'drop in' replacements.

I presume why the proposal here with this prefix is that it remains the same for all of the implementations within pqcp that are implemented in a particular language therefore allowing swapping between them.

(Accept that with other languages such as go, java etc the namespacing at function level is not needed due to package definitions)
- More info
As a followup to the point above about what explicit instruction support is needed for a particular implementation - is there a need to be able to query this at runtime - it to expose a discovery api that amongst other things would report some specifics about the implementation actually being called?

planetf1 commented 1 month ago

I've proposed an update on this for the TSC this week.

pq-code-package / tsc

Determine any cross-implementation API requirements #4