Accessing non-existent elements in locking_shard::controller and coordinator::controller

mszulcz-mitre commented 2 years ago

Affected Branch

trunk

Basic Diagnostics

[X] I've pulled the latest changes on the affected branch and the issue is still present.
[X] The issue is reproducible in docker

Description

Problem

This issue is similar to Issue #140.

There are no checks in the constructor or init method forlocking_shard::controller that the values of parameters shard_id and node_id are valid. This can lead to non-existent element accesses in vectors. For example, here is the constructor: https://github.com/mit-dci/opencbdc-tx/blob/a8b696b315c670f3b3f71ab353cc471c0d7025e8/src/uhs/twophase/locking_shard/controller.cpp#L17-L50

On Line 32, m_shard_id is used as in index to m_shard_ranges. On line 40, shard_id and node_id are used as indices to m_locking_shard_raft_endpoints. They are also used as indices to vectors in at least 2 places in init.

The same problem occurs in coordinator::controller, but the variables used as indices are m_node_id and m_coordinator_id: https://github.com/mit-dci/opencbdc-tx/blob/a8b696b315c670f3b3f71ab353cc471c0d7025e8/src/uhs/twophase/coordinator/controller.cpp#L18-L55

Solution

There a a few ways to solve this problem.

One way is the use the at method of std::vector for element access. Unlike the bracket operator [], it provides bounds checking and will raise an exception if the requested element is out of range.
A second method is to move all code with potentially problematic vector accesses to the init methods, manually check whether the indices are in range, and if not, log an error and return false. This method is used to check other things in init. For example, here is init for locking_shard::controller: https://github.com/mit-dci/opencbdc-tx/blob/a8b696b315c670f3b3f71ab353cc471c0d7025e8/src/uhs/twophase/locking_shard/controller.cpp#L52-L88

Code of Conduct

[X] I agree to follow this project's Code of Conduct

HalosGhost commented 2 years ago

Given that bounds-checking comes with performance-considerations, and init() is rarely-called, I'd rather the bounds-checking verification happen there rather than pushing that verification down the stack.

solid concept-ACK.

mszulcz-mitre commented 2 years ago

@HalosGhost Absolutely-- bounds checking probably won't significantly impact the overall execution time in if used in inits or constructors (or their initializer lists).

Could I ask about the project's overall error-handling strategy? In a lot of the code I've seen, it seems the strategy is to log a message and return false if a function or method has an error. Using the at method to detect out-of-range errors is simple and useful when performance is not an issue, but it diverges from that strategy because it raises exceptions. I don't see any exception handing in the code, so currently execution will just stop.

Do you think it's reasonable for the program to terminate due to out-of-range errors without error logging? If so, only small changes to the code would be required to detect these errors in the classes mentioned in this Issue, in Issue #140, and potentially in other parts of the code. Seeing the program terminate and reading about the exception on stdout seems reasonable to me, but I'd like to make sure it doesn't undermine the code's overall error-handling strategy.

HalosGhost commented 2 years ago

Exceptions are disabled in this code-base (for a variety of reasons), so we should not rely on any exception-handling.

I think it is likely best to avoid throwing exceptions wherever possible so that some version of graceful failure is possible.

maurermi commented 1 year ago

@HalosGhost is this issue resolved by #154 ?

mit-dci / opencbdc-tx