mit-dci / opencbdc-tx

A transaction processor for a hypothetical, general-purpose, central bank digital currency
Other
895 stars 200 forks source link

Fix casting errors in client-cli #188

Open mszulcz-mitre opened 1 year ago

mszulcz-mitre commented 1 year ago

client-cli can produce casting errors if the numbers given as command-line arguments are negative. With this commit, the code now checks if the arguments are negative, and if so, prints an error message and exits.

Signed-off-by: Michael L. Szulczewski mszulczewski@mitre.org

HalosGhost commented 1 year ago

Just a food-for-thought question: would it make sense to try to parse the argument as a signed long long first instead of trying to do some manual parsing tricks? Doing so means the test is a little more straight-forward:

sll ← as_sll(args₅)
if sll is E2BIG:
    ull ← as_ull(args₅)
elif sll ≤ 0 or sll is Error(_):
    error out
else:
    ull ← cast<ull>(sll)

Perhaps that's not really any cleaner than manually finding the first non-space character and manually matching -. Don't take this as a suggested change, just something to consider.

metalicjames commented 1 year ago

would it make sense to try to parse the argument as a signed long long first instead of trying to do some manual parsing tricks?

+1

I agree that's a better approach

mszulcz-mitre commented 1 year ago

@HalosGhost Interesting suggestion! What does the first check in your pseudo-code do?

sll ← as_sll(args₅)
if sll is E2BIG:
    ull ← as_ull(args₅)

I'm used to seeing E2BIG as an error code that means "Argument list too long" (C++ ref, GNU ref).

I'm happy to do the test either way, but the advantages of first converting to a signed long long aren't clear to me. It seems like the main difference is whether you check if a signed int is less than zero or if a string starts with a minus character.

HalosGhost commented 1 year ago

What does the first check in your pseudo-code do?

Sorry for the confusion, the actual error-code I should have referenced was ERANGE (that the conversion failed because it fell outside the representable range of signed long long). And, in reality, we probably need to use std::strtoll() because exceptions are disabled (so a failed parse with std::sto*() will effectively std::abort()).

The primary benefit of this would essentially be that it offloads the actual parsing (e.g., skipping whitespace) to the standard library.

Again, I'm not actually sure this is dramatically cleaner…

mszulcz-mitre commented 1 year ago

Sorry for the confusion, the actual error-code I should have referenced was ERANGE

Great--thanks for clarifying.

And, in reality, we probably need to use std::strtoll() because exceptions are disabled (so a failed parse with std::sto*() will effectively std::abort()).

Is it a problem to terminate the program for out-of-range errors? I guess this is what you're saying above, but just to be certain, this is what happens with the current use of std::stoll and std::stol:

root@3fd9e63ade0c:/opt/tx-processor# ./build/src/uhs/client/client-cli 2pc-compose.cfg mempool0.dat wallet0.dat mint 1000000000000000000000000000000 5
terminate called after throwing an instance of 'std::out_of_range'
  what():  stoull
Aborted (core dumped)

Maybe the error message could be a little clearer, but it doesn't seem too bad to me.

HalosGhost commented 1 year ago

In the general case, no it's not a problem to terminate on those cases. However, with the alternative I offered, it's possible to require parsing twice (when the argument fits into the range of unsigned long long, but is too large to fit into a signed long long). When that happens, instead of appropriately double-parsing, it'll just fail out because of the out-of-range exception from the first parse.

Maybe it doesn't actually matter (this is research-level code after all), and restricting the client to work within [0, LLONG_MAX] is fine (even though the range available to the underlying API is [0, ULLONG_MAX]). In that case, there's no need for the double-parse (or error-handling) at all, and the pseudo-code becomes:

sll ← as_sll(args₅)
if sll ≤ 0:
    error out

ull ← cast<ull>(sll)
mszulcz-mitre commented 1 year ago

Here's an implementation that I think does everything that @HalosGhost's pseudo-code does:

const auto* p = n_outputs_arg.data();
char* p_end;
const auto n_outputs_sll = std::strtoll(p, &p_end, 10);
if (n_outputs_sll < 0){
    std::cerr << "The requsted number of UTXOs cannot be negative."
              << std::endl;
    return false;
}
const auto n_outputs = std::stoull(n_outputs_arg);

I ended up modifying the pseudo-code because the first part would fail if the input argument was both out of range and negative:

sll ← as_sll(args₅)
if sll is E2BIG:
    ull ← as_ull(args₅) \\ This will have a casting error if args₅ is negative.

For comparison, here's the version that checks for a leading '-' in the argument string:

auto idx_first_non_whitespace
    = n_outputs_arg.find_first_not_of(' ');
if(n_outputs_arg[idx_first_non_whitespace] == '-') {
    std::cerr << "The requsted number of UTXOs cannot be negative."
              << std::endl;
    return false;
}
const auto n_outputs = std::stoull(n_outputs_arg);

Both methods look good to me. However, I guess I favor the latter a little because it seems simpler and more readable. I'll plan to push a cleaned version of it, but if someone feels strongly about the strtoll version, I'm happy to change it.

HalosGhost commented 1 year ago

I've gone back and forth on this a couple of times, and I think the only reason I'm coming down in-favor of parsing rather than hand-matching is that there's another error case that's not handled in the code you have so far: asking for 0 UTXOs (or for UTXOs of 0-value). If we already have it as an integer, we can catch both error cases using <= instead of < (and changing the error appropriately). this is actually also why preferring to parse is, in-general, safer: it's more likely to catch malformed input earlier, and gives you structured data as output so you can refine more immediately.