Console().process_results not matching documentation behavior

bboilot-ledger commented 3 months ago

Hi @JulioLoayzaM :wave:

I am encountering an issue while trying to use Console().process_results with filename=None.

How to reproduce

Run a test campaign
Print the results using Console().process_results with filename=None

Expected behavior

As the documentation says:

        Args:
            res: The results to display.
            filename: An optional file. If a string is passed, the results are saved to
                that file. If an empty string is passed, the user is prompted. If None,
                the results are not saved and the user is not prompted.

The user should not be prompted for saving the results to a file.

Current behavior

The user is still prompted to save the results to a file.

Possible cause

In primitives/common.py:776, the condition bool(filename) or Confirm.ask("Save the results to a file?", default=False) as the same behavior if filename == "" or if filename == None (both return False when cast to bool). If I replace the line by something like if filename is not None and (bool(filename) or Confirm.ask("Save the results to a file?", default=False)):, I get the expected behavior. Since I am unsure what is the expected behavior or the best way to fix it, I did not open a PR. But I can if you want me to :)

bboilot-ledger commented 3 months ago

Also, It would make sense to me to be able to store more than one ResultsDict inside a file report (for example for all test corresponding to AES GCM with different key lengths). What do you think about adding the possibility of appending to the file or directly switching to append mode?

JulioLoayzaM commented 3 months ago

Hi @bboilot-ledger,

The expected behaviour is indeed the documented one, so this is a bug. Your fix looks good so I can commit it with a corresponding test to confirm it works, and add you as commit co-author if that's alright.

Regarding your second comment, I'm guessing the intended usage is repeated calls to process_results in-between tests? Like:

res = AES.test(enc, dec, AES.Mode.GCM, AES.KeyLength.AES128)
console.process_results(res, "res.txt")
res |= AES.test(enc, dec, AES.Mode.GCM, AES.KeyLength.AES192)
console.process_results(res, "res.txt")

If so, my first answer is that you can use AES.KeyLength.ALL to test all three key lengths at one. The resulting ResultsDict should contain several Results as values. This is the recommended approach since there is no file appending.

Otherwise, you should be able to test different primitives and parameters thanks to the |= operator, which combines two dictionaries. The caveat with this approach is that keys must be unique or results will be overwritten (and I see that this does already happen with Wycheproof results with AES so I'll open an issue). The thing is that the keys are set by the test themselves, so there's been some work with ResultsDict.add method to create unique keys.

The current approach uses information on the primitive and parameters selected to create the key, so it still does not allow to add tests for two different implementations and the same exact parameters in one ResultsDict without overwriting. So if that's what you were referring to, or some other use case for appending the results to the file, we can discuss it further. :)

bboilot-ledger commented 3 months ago

:wave: @JulioLoayzaM

Regarding the fix, it looks good to me :smiley:.

Yes, I was looking for a way to aggregate results of different primitives for one implementation (not the same primitive for different implementation). For me it would make sense if you're testing a cryptographic library to have one report that contains all the tested primitives, the global results like number of tests and then detailed results for each primitives.

I did not see AES.KeyLength.ALL, cool feature! :+1:

However, if I try your more generic approach using the |= operator (which is exactly what I am looking for), I am not able to get the correct results: If I use the != operator, I get: If I store different primitives ResultsDict in an array and run console.process_results on each element, I get: It may be related to the problem you mentioned regarding AES GCM.

Also, I will have to check again but I feel like the classic report is not verbose regarding failing tests. Like you would have to run the test again in debug mode to understand which test (what were the inputs) failed. And in debug mode you also get that information for every passing test. But in an other hand, it may be too verbose if many tests are failing. So I don't actually know what to think about that...

Otherwise, the report format is super clear, I love the flag feature and the colorization to quickly see important information.

JulioLoayzaM commented 2 months ago

Hi @bboilot-ledger,

Regarding the fix, it looks good to me 😃.

Fix pushed :) I'll push a new release on monday.

For me it would make sense if you're testing a cryptographic library to have one report that contains all the tested primitives, the global results like number of tests and then detailed results for each primitives.

Ahh I see, that makes sense indeed. For the Python API, combining results should work with |=, modulo the problem of key uniqueness. For the CLI, there currently isn't a way but I've been testing a simpler interface that would use a single wrapper to test several primitives and/or options.

However, if I try your more generic approach using the |= operator (which is exactly what I am looking for), I am not able to get the correct results:

That looks like a bug :P I've tried to reproduce with AES-ECB and SHA-256 but I do get the combined results. I used the following script:

from hashlib import sha256

from Crypto.Cipher import AES as pyAES

from crypto_condor.primitives import AES, SHA
from crypto_condor.primitives.common import Console

def _sha256(data: bytes) -> bytes:
    return sha256(data).digest()

def _aes_ecb(key: bytes, plaintext: bytes) -> bytes:
    return pyAES.new(key, pyAES.MODE_ECB).encrypt(plaintext)

rd = AES.test(_aes_ecb, None, AES.Mode.ECB, AES.KeyLength.AES128)
print(rd.keys())
rd |= SHA.test(_sha256, SHA.Algorithm.SHA_256)
print(rd.keys())

console = Console()
console.process_results(rd)

And I obtain:

╭───────────────────────────────── Results summary ─────────────────────────────────╮
│ Primitives tested: AES, SHA                                                       │
│ Valid tests:                                                                      │
│   Passed: 718                                                                     │
│   Failed: 0                                                                       │
╰────────────────────── crypto-condor 2024.08.03 by Quarkslab ──────────────────────╯

I picked SHA-256 because it was the easiest, but the problem seems to come from the ECDSA module. To try to reproduce, it seems that you are testing signature verification with brainpoolP256r1 correct?

Also, I will have to check again but I feel like the classic report is not verbose regarding failing tests. [...] But in an other hand, it may be too verbose if many tests are failing.

Yes, by default only the global and per-test results are displayed. The reason was indeed that if many tests are failing, adding the failed tests is very verbose. But that was considering that you could easily and rapidly run the test again in debug mode, which is the case for the test cases based on Python libraries, but may not be the case with real targets. I'll think about adding failed cases by default.

Otherwise, the report format is super clear, I love the flag feature and the colorization to quickly see important information.

Glad to hear that :smile:

bboilot-ledger commented 2 months ago

Fix pushed :) I'll push a new release on monday.

Awesome!

To try to reproduce, it seems that you are testing signature verification with brainpoolP256r1 correct?

This is correct. I tried merging results of ECDSA verify using brainpoolP256r1/sha256 and AES ECB 128.

Yes, by default only the global and per-test results are displayed. The reason was indeed that if many tests are failing, adding the failed tests is very verbose. But that was considering that you could easily and rapidly run the test again in debug mode, which is the case for the test cases based on Python libraries, but may not be the case with real targets. I'll think about adding failed cases by default.

Exactly, some tests can take hours when using embedded devices for example. But I still need some thinking about it. If I remember well, in interactive mode you can choose to save the debug results after seeing the number of failed tests. So it seems to be the best approach currently.

JulioLoayzaM commented 2 months ago

New release 2024.08.19 pushed :) as it fixes the issue, I'll close it and answer the other points separately.

JulioLoayzaM commented 2 months ago

This is correct. I tried merging results of ECDSA verify using brainpoolP256r1/sha256 and AES ECB 128.

I can't reproduce the issue with AES-ECB. I can see that the ResultsDict of ECDSA.test_verify uses nist and wycheproof as keys, and the latter would conflict with Wycheproof tests for e.g. AES-GCM, but not with AES-ECB as there aren't Wycheproof vectors for ECB.

Running:

from crypto_condor.primitives import AES, ECDSA
from crypto_condor.primitives.common import Console

def _verify(public_key: bytes, message: bytes, signature: bytes) -> bool:
    return ECDSA._verify(public_key, ECDSA.Hash.SHA_256, message, signature)

def _enc(key: bytes, plaintext: bytes) -> bytes:
    return AES._encrypt(AES.Mode.ECB, key, plaintext)

def _dec(key: bytes, ciphertext: bytes) -> bytes:
    return AES._decrypt(AES.Mode.ECB, key, ciphertext)

console = Console()

r1 = AES.test(_enc, _dec, AES.Mode.ECB, AES.KeyLength.AES128)
r2 = ECDSA.test_verify(
    _verify, ECDSA.Curve.BRAINPOOLP256R1, ECDSA.Hash.SHA_256, ECDSA.PubKeyEncoding.DER
)

print("\n", f"{r1.keys() = }", "\n\n", f"{r2.keys() = }")

r1 |= r2
console.process_results(r1, None)

results in:

r1.keys() = dict_keys(['NIST/encrypt/ECB 128 VarTxt', 'NIST/encrypt/ECB 128 GFSBox', 'NIST/encrypt/ECB 128 KeySbox', 'NIST/encrypt/ECB 128 VarKey', 'NIST/encrypt/ECB 128 MMT', 'NIST/decrypt/ECB 128 VarTxt', 'NIST/decrypt/ECB 128 GFSBox', 'NIST/decrypt/ECB 128 KeySbox', 'NIST/decrypt/ECB 128 VarKey', 'NIST/decrypt/ECB 128 MMT']) 

 r2.keys() = dict_keys(['wycheproof'])

╭────────────────────────────────────────── Types of tests ───────────────────────────────────────────╮
│ Valid tests     : valid inputs that the implementation should use correctly.                        │
│ Invalid tests   : invalid inputs that the implementation should reject.                             │
│ Acceptable tests: inputs for legacy cases or weak parameters.                                       │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────── Results summary ──────────────────────────────────────────╮
│ Primitives tested: AES, ECDSA                                                                       │
│ Valid tests:                                                                                        │
│   Passed: 1326                                                                                      │
│   Failed: 0                                                                                         │
│ Invalid tests:                                                                                      │
│   Passed: 239                                                                                       │
│   Failed: 0                                                                                         │
╰─────────────────────────────── crypto-condor 2024.08.19 by Quarkslab ───────────────────────────────╯

In any case, it is clear that the keys used to store Results have to be changed to ensure uniqueness. I'll use #8 to track that problem.

JulioLoayzaM commented 2 months ago

To conclude (I had to leave mid-comment :p)

Exactly, some tests can take hours when using embedded devices for example. But I still need some thinking about it. If I remember well, in interactive mode you can choose to save the debug results after seeing the number of failed tests. So it seems to be the best approach currently.

Yep, with the CLI you get asked whether to add the debug data to the report, either when using --save-to or after being asked whether to save the results to a file.[^1] The difference with using increased verbosity is that the traceback is printed to the terminal, but is not saved to the report. This could be an improvement to pinpoint issues, at the cost of a bigger report, I'll keep it in mind.

Otherwise with the Python API you can process_results with or without the debug data as long as the ResultsDict instance is around :)

[^1]: I am now realising that there is no CLI option to save a report with debug data, though you do get prompted when using --save-to. Tracking this in #10.

quarkslab / crypto-condor