tuupola / branca-spec

Authenticated and encrypted API tokens using modern crypto
https://www.branca.io/
219 stars 7 forks source link

Add reference test vectors #33

Closed brycx closed 3 years ago

brycx commented 3 years ago

18

These test vectors combine what I've found in the PHP implementation, with some extra tests. As mentioned, these include the nonces, but can be removed depending on your conclusion @tuupola.

The extra tests add more edge-cases, related to empty input, non-UTF8, etc.

TODO:

Test 18, when run with pybranca, fails. This is because it allows the timestamp to overflow the spec 0..4294967295 range, when checking for expiration with a TTL. I assume this is not intended and I would like to restrict the TTL not to overflow the timestamp in the spec, @tuupola? I'd be happy to add it to the spec myself.

from branca import Branca
import json
from binascii import unhexlify, hexlify
import calendar
from datetime import datetime

with open('test_vectors.json', "r") as test_vectors:
    data = json.load(test_vectors)

for test_vector in data:
    print("Testing:", test_vector['test_vector_id'])
    print("Comment:", test_vector['comment'])

    try:
        ctx = Branca(test_vector['key'])
        ctx._nonce = unhexlify(test_vector['nonce'])
        token_actual = ctx.encode(unhexlify(test_vector['msg']), timestamp=test_vector['timestamp'])

        try:
            message_actual = ctx.decode(test_vector['token'], ttl=test_vector['ttl'])
        except:
            if test_vector['is_valid'] is False:
                print("Pass: TRUE")
            else:
                message_actual = ctx.decode(test_vector['token'])

        # Define passing test
        if test_vector['is_valid'] is True:
            if token_actual == test_vector['token'] and message_actual == unhexlify(test_vector['msg']):
                print("Pass: TRUE")
            else:
                print("Pass: FALSE")
        else:
            if token_actual != test_vector['token']:
                print("Pass: TRUE")
            elif message_actual != unhexlify(test_vector['msg']):
                print("Pass: TRUE")
            else:
                print("Pass: FALSE")
    except:
        if test_vector['is_valid'] is False:
            print("Pass: TRUE")
        else:
            print("Pass: FALSE")
tuupola commented 3 years ago

Yeah not intended, timestamp should not overflow. Good catch!

tuupola commented 3 years ago

Just a quick note. I agree that if the implementation has possibility of setting the nonce for unit testing that can be used. If it is not possible then implementation could use the workardound you described.

To make things prettier I would use something like beefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeef as nonce in the test vectors.

I will update either the Python or PHP version unit tests to use the new vector this weekend to see how it looks.

tuupola commented 3 years ago

What does invalid nonce test? Do I understand correctly the format (lenght) of user provided nonce?

Also what does wrong nonce test?

brycx commented 3 years ago

I agree that if the implementation has possibility of setting the nonce for unit testing that can be used. If it is not possible then implementation could use the workardound you described.

Do I understand correctly, that you've decided to keep the nonces in the test vectors then?

To make things prettier I would use something like beefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeef as nonce in the test vectors.

Will add it as a TODO.

What does invalid nonce test? Do I understand correctly the format (lenght) of user provided nonce?

Yes, this test, along with "Invalid key.", tests input that is one byte less than the allowed. I'll update the comments for those to clarify.

Also what does wrong nonce test?

It tests that using a different nonce, than was used during token creation, will result in an error. Will also update the comment to clarify this.

tuupola commented 3 years ago

It tests that using a different nonce, than was used during token creation, will result in an error. Will also update the comment to clarify this.

Isn't this basically the same test for modified nonce?

brycx commented 3 years ago

Ah yes, that's a duplicate. I'll remove it.

Sorry, it's not strictly speaking a duplicate. TC 12 has the invalid nonce in the nonce field, where TC 14 has the nonce modified in the actual token. In that sense it's different, but perhaps the end result during testing is exactly the same. I think the idea was that, if you were unable to set a nonce during unit-testing, you could still test using invalid nonces by using TC 14 which only requires a decrypt to do so. Not entirely sure atm.

tuupola commented 3 years ago

I understood TC 10 has invalid nonce ie wrong number of bytes. To me TC 12 and TC 14 seem the same since they both test the same thing. Decoding token with different nonce it was created with ie tampered token.

tuupola commented 3 years ago

I think the TTL test does not belong here because it is not really part of the spec. Or at least it should be mentioned that this is not mandatory test since TTL is not part to the token, instead it is something the implementation may provides.

brycx commented 3 years ago

To me TC 12 and TC 14 seem the same since they both test the same thing. Decoding token with different nonce it was created with ie tampered token.

They are in most cases at least. I'll remove TC12 and keep TC14.

I think the TTL test does not belong here because it is not really part of the spec. Or at least it should be mentioned that this is not mandatory test since TTL is not part to the token, instead it is something the implementation may provides.

True. I don't know if it's good to have an "optional" test. Normally, reference test vectors would be a minimum requirement for a passing implementation. Probably best to exclude it.

tuupola commented 3 years ago

One more thing. While implementing new test vectors here:

https://github.com/tuupola/branca-php/pull/14

I noticed it might be bit confusing for developer that some test vectors are only about decoding and some are about both encoding and decoding.So I though maybe there should be separate test vectors for them. They could be even separate json files.

I implemented the tests in such way here: https://github.com/tuupola/branca-php/blob/c9da4c2aef74dc9e3ab539ba60cf8302c814d477/tests/BrancaTest.php

Decoding vectors test that implementation can retrieve timestamp and payload from the token, or throws an exception in case of invalid token.

  {
      "id": 2,
      "comment": "November 27th timestamp.",
      "key": "supersecretkeyyoushouldnotcommit",
      "timestamp": 123206400,
      "token": "875GH234UdXU6PkYq8g7tIM80XapDQOH72bU48YJ7SK1iHiLkrqT8Mly7P59TebOxCyQeqpMJ0a7a",
      "payload": "48656c6c6f20776f726c6421",
      "is_valid": true
    },

While encoding vectors also to provide the nonce and test must make sure that with given nonce, timestamp and token the implementation creates the given token.

  {
      "id": 2,
      "comment": "November 27th timestamp.",
      "key": "supersecretkeyyoushouldnotcommit",
      "nonce": "beefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeef",
      "timestamp": 123206400,
      "token": "89i7YCwu5tWAJNHUDdmIqhzOi5hVHOd4afjZcGMcVmM4enl4yeLiDyYv41eMkNmTX6IwYEFErCSqr",
      "payload": "48656c6c6f20776f726c6421",
      "is_valid": true
    },

What do you think?

tuupola commented 3 years ago

True. I don't know if it's good to have an "optional" test. Normally, reference test vectors would be a minimum requirement for a passing implementation. Probably best to exclude it.

I agree.

brycx commented 3 years ago

I think you're right, that it might become confusing. I don't think a separate file is needed though, since we can just have two separate groups in the current file. But making two different structures for tests will also require two different objects to serialize them to. So it also does add some extra work for the user.

Based on your example, I'm not sure what you're suggesting in terms of how to practically approach this? Simply have two groups of test vectors?

tuupola commented 3 years ago

Yeah, two groups of test vectors. One for encoding and one for decoding. Personally I think having two separate files would be the clearest but two separate groups is ok too.

tuupola commented 3 years ago

For reference I have currently separated the tests into two groups in PHP tests. Also used a prettier 0xbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeef everywhere.

https://github.com/tuupola/branca-php/blob/new-test-vectors/tests/BrancaTest.php

brycx commented 3 years ago

For reference I have currently separated the tests into two groups in PHP tests.

Thanks. I've noted all the suggestions in the TODO in the PR description. Will try to get them fixed soon.

brycx commented 3 years ago

I've updated the test vectors to match what you defined in the PHP test, as you asked. Do you want the nonce field removed entirely, for test only meant for decoding? While using the current test vectors, the missing nonce field only on some tests makes it a much bigger pain to deserialize the test vectors (instead of having one common type for a test vector to ser/der against).

Also, the overflowing timestamp test is still there. Couldn't remember if you wanted that one left out.

tuupola commented 3 years ago

Yeah could use null nonce in the json instead of removing it entirely. I agree deserialising is probably more pleasant with unified data structure.

I think overflowing timestamp does not belong there since it TTL check implementation specific and not part of spec itself.

One more thing, could you rename test_vector_id to id and then I think we are done.

brycx commented 3 years ago

Yeah could use null nonce in the json instead of removing it entirely. I agree deserialising is probably more pleasant with unified data structure.

Great, I've fixed this.

I think overflowing timestamp does not belong there since it TTL check implementation specific and not part of spec itself. One more thing, could you rename test_vector_id to id and then I think we are done.

Done.

I noticed the PHP impl has the prettified nonce repeated twice, so 48 bytes I believe. I couldn't get the correct nonce from the tokens you provided therein, they seemed to be random. I suspect the PHP impl has automatically replaced the nonces with random data, since they were too large. Either way, I've corrected the nonces and the expected tokens.

tuupola commented 3 years ago

The nonce in tests is not a string, it is hex 0xbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeef which is 24 bytes. It comes from old trick in embedded world where memory is filled with 0xdeadbeeffor debugging. For example taking token from test vector 14.

$token = "875GH233SUysT7fQ711EWd9BXpwOjB72ng3ZLnjWFrmOqVy49Bv93b78JU5331LbcY0EEzhLfpmSx";
$binary = (new Base62())->decode($token);
var_dump(bin2hex($binary));

string(114) "ba0757fb0000efbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefd8fdbaf35dc37a98b523e6fef3faf98dd385c68046fb7ed63c94995b"
brycx commented 3 years ago

The nonce in tests is not a string, it is hex 0xbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeef which is 24 bytes.

After I left my previous comment, I was thinking it probably was hex instead. Sorry about that, something I misunderstood. After looking at the PHP again, I see hex2bin being used as well. Should've been pretty obvious haha. Anyway, I've reverted the tokens and nonces. Should match the PHP impl now.

tuupola commented 3 years ago

In json there is mixed camelCase and snake_case. I think should use only one style.

tuupola commented 3 years ago

I don't understand the invalid nonce test. The encoding part tests whether invalid nonce is passed but what should the decoding part test?

I actually originally meant the testing groups should be encoding and decoding. Now there are encoding and decoding and decoding. I find this slightly confusing with some tests, like the one above.

What we could do is that I merge the PR like this and I then do some cosmetic changes. I would like to reorder some tests to go from "easy" to "obscure edge case". Is that ok with you?

brycx commented 3 years ago

The encoding part tests whether invalid nonce is passed but what should the decoding part test?

The intent was to test that both encoding and decoding reject invalid nonce-sizes.

encoding and decoding

I agree. Calling it encoding is probably more clear. I wrote it because, it's important to test both encoding and decoding when possible. Though, this probably shouldn't be the job of the test vectors to imply. This usually is the job of the tester themself.

What we could do is that I merge the PR like this and I then do some cosmetic changes. I would like to reorder some tests to go from "easy" to "obscure edge case". Is that ok with you?

This is fine. I've re-named the group and enforced the camelCase. Go ahead and merge. Alternatively, I've enabled edits by maintainers for this PR, so you should be able to push directly to this branch as well.

If you could, I'd love a ping in this PR once your cosmetic changes are done (if you decide to merge this before applying yours). Then I know when to merge the final tests for the Rust implementation.

tuupola commented 3 years ago

The intent was to test that both encoding and decoding reject invalid nonce-sizes.

When verifying a token the nonce would be the bytes 6 - 29 from the binary (base62 decoded) token. So only if the given token is less than 29 bytes the extracted nonce would be of invalid size.

I wrote it because, it's important to test both encoding and decoding when possible. Though, this probably shouldn't be the job of the test vectors to imply. This usually is the job of the tester themself.

I agree. I though about adding separate vectors for encode and decode parts. There will be couple of more vectors this way but result is more unit-test-y since each test would test only one thing.

Alternatively, I've enabled edits by maintainers for this PR, so you should be able to push directly to this branch as well.

I will squash this and I am afraid to accidentally hijack the PR if there are my commits too. Not always sure how GitHub works. I can always bump the version number in the json file when my changes are done.

Thanks!

brycx commented 3 years ago

When verifying a token the nonce would be the bytes 6 - 29 from the binary (base62 decoded) token. So only if the given token is less than 29 bytes the extracted nonce would be of invalid size.

True, then I actually don't remember what the purpose is except for testing nonce size when encoding.

I agree. I though about adding separate vectors for encode and decode parts. There will be couple of more vectors this way but result is more unit-test-y since each test would test only one thing.

Sounds like it could be a good idea. A common test group of sorts.

I will squash this and I am afraid to accidentally hijack the PR if there are my commits too. Not always sure how GitHub works. I can always bump the version number in the json file when my changes are done.

That's totally fine. Thank you as well.