Open andrewboie opened 5 years ago
One problem with making an assertion into an early return is cleanup. Consider ("normal" C for clarity, not Zephyr code):
int foo(int arg)
{
struct whatever *result = malloc(sizeof(struct whatever));
// ...
__ASSERT(arg != INVALID_ARG, "");
}
Now the code is going to leak the resource it acquired because it bailed too soon.
It's not insoluble, but it's messy. We could require that code never assert in a context where a synchronous return would be illegal (too limiting and too difficult a rule to follow). We could have an ASSERT_CLEANUP macro or something that worked like a FOREACH and had a block at the end (weird, but IMHO kinda clever).
Farther afield there are GCC extensions that allow you to define code to run at block exit we could try to leverage (sort of a hook to the C++ destructor invocation logic that the compiler has), and of course in the C++ world there are exceptions which speak to this need too.
But really I think my biggest worry here is that most assertions don't reflect things we want to check at runtime at all. They're bug catchers describing an invariant in the code. Right now we all feel free to sprinkle these around because we know they'll never end up in production and it makes us feel safer to have the extra layer of validation when the code changes. But if we knew we had to be careful with them... would we just stop using them? I think that would make the code worse, not better.
One problem with making an assertion into an early return is cleanup. Consider ("normal" C for clarity, not Zephyr code):
int foo(int arg) { struct whatever *result = malloc(sizeof(struct whatever)); // ... __ASSERT(arg != INVALID_ARG, ""); }
Now the code is going to leak the resource it acquired because it bailed too soon.
It's not insoluble, but it's messy. We could require that code never assert in a context where a synchronous return would be illegal (too limiting and too difficult a rule to follow). We could have an ASSERT_CLEANUP macro or something that worked like a FOREACH and had a block at the end (weird, but IMHO kinda clever).
We should not use "ASSERT", we should create a new macro, e.g Z_CHECK_FAIL
, Z_CHECK_FAIL_GOTO
... assert is a well macro defined in POSIX and in C standard change its behaviour will make people confuse.
Farther afield there are GCC extensions that allow you to define code to run at block exit we could try to leverage (sort of a hook to the C++ destructor invocation logic that the compiler has), and of course in the C++ world there are exceptions which speak to this need too.
cleanup
extension is cool but we should avoid it IMHO, we have to be compiler agnostic. And even if we do some black magic to have it portable this will probably trigger a lot of violations to C code standard.
But really I think my biggest worry here is that most assertions don't reflect things we want to check at runtime at all. They're bug catchers describing an invariant in the code. Right now we all feel free to sprinkle these around because we know they'll never end up in production and it makes us feel safer to have the extra layer of validation when the code changes. But if we knew we had to be careful with them... would we just stop using them? I think that would make the code worse, not better.
4. System call checks which duplicate implementation function assertions can be removed since the implementation function is always doing the checking now.
That was my initial concern, where to do the checks, in syscalls or in the implementation itself
That was my initial concern, where to do the checks, in syscalls or in the implementation itself
I think they should be in the implementation where possible.
But really I think my biggest worry here is that most assertions don't reflect things we want to check at runtime at all. They're bug catchers describing an invariant in the code.
Invariant checks as currently be done by _ASSERT are an improvement for functional safety, they should not be removed. They may be triggered because of RAM/ROM bit flips or some unforeseen computation sequence. The problem is how to react on them. From my experience in functional safety development - if an invariant check triggers there is a real problem in the system. Usually the only option is to bring the system to the safe state. For availibility reasons and in case the invariant check triggered in a non safety thread you may decide just to restart or stop the thread.
It does me no good if a device ends up in la-la-land because of an undetected/unhandled error, or in a halted state after printing an error message to a disconnected console, so I agree with this:
From my experience in functional safety development - if an invariant check triggers there is a real problem in the system. Usually the only option is to bring the system to the safe state.
To some degree the determination of safe state is application-specific. In one framework I've built a detected failure enters a path that records where the error was detected along with information about the error (e.g. the type of failure). This is placed in no-init memory along with other metadata; then the system resets and during startup this area is checked for recorded faults which are then interpreted and logged, and the application proceeds to whatever it wants as a "safe state" (which may include proceeding with normal functionality if recovery could be completed during startup).
This API is available for both system and application failures. A framework for something like this would be nice to have in Zephyr.
What we want is not ASSERT but as @ceolin described a variant of "Design by Contract" from Eiffel.
And "What to do when something goes wrong" is application specific, meaning that in some case we want to abort with backtrace so that it allows us to see what's wrong In other case, we want the system to run as much as possible recovering from the error (safety applications). We might want to log what's going on in some system and might not for another. In yet another use case, we might disable all together for performance.
So multiple implementations and selectable by Kconfig is good. But what we should provide is a pluggable system so that users can implement their own version of it according to their requirements.
My question is: does anybody know a good practice to implement DbC in C?
@andrewboie There are issues with this proposal. You mentioned yourself one quite critical: code size. But the overall issue is trying to define a one-size-fits-all approach for this. As Zephyr targets a wide range of applications and platforms constraints, just adding unconditionally compiled runtime checks would be a killer for some of them. Similarly not having asserts is also not an option for others.
What I think we need is to:
This will, of course, add extra paths to test, but I certainly think it will be worth it given Zephyr’s targets.
Silly pseudo code snippet as example of what I mean:
int k_foo(void *ptr)
{
ASSERT(ptr == NULL, ASSERT_INPUT_PARAM_CHECKS_ON);
if (RUNTIME_INPUT_PARAM_CHECKS_ON && ptr == NULL) {
return EINVAL;
}
// whatever else
return 0;
}
//the ASSERT and runtime check could be merged into a single macro, this is just an example
Stating what will be obvious to many:
@andrewboie There are issues with this proposal. You mentioned yourself one quite critical: code size.
@aescolar let's get some data first before we start worrying about code size. @nashif did some initial investigation and found that the code size penalty to propagating return values was trivial
The string could either be logged or discarded by the compiler.
A message catalog is a common alternative that saves memory while preserving human-readable information, example:
https://en.wikipedia.org/wiki/MIPI_Debug_Architecture#Visibility_of_SoC-internal_operations http://debugger.iul.intel.com/documentation/npk-sdk/doc/sventx/html/a00034.html
This is still live. We still lack a framework for "recoverable runtime errors". In the modern world that's mostly a userspace thing, so maybe @dcpleung would be a better owner?
Introduction
Purpose of this ticket is to discuss changes to our policies for kernel APIs as they relate to returning error values.
Problem description
Currently, we have the following:
Proposed change
With Zephyr moving towards functional safety certifications, we should allow for robust error checking, with APIs being augmented to return error codes.
1) APIs that return void will be converted to return an integer indicating success or failure, with failure returning a negative errno code. 2) APIs that return a pointer, should return NULL on an error condition. We might consider populating errno with a more detailed error code. 3) All current assertions in API implementation functions should be converted to return an error condition instead. 4) System call checks which duplicate implementation function assertions can be removed since the implementation function is always doing the checking now. 5) Additional system call checks should propagate error values instead of invoking Z_OOPS().
Some random thoughts for discussion
We need to decide if there are any valid uses of Z_OOPS() in system calls. Do we keep using Z_OOPS() for memory access issues, or kernel object validation checks?
We will need consistency on what error values are returned. For example, if we decide to return an error if memory access fails, every API should return -EACCES.
Returning errors instead of failing assertions/Z_OOPS() adds additional responsibility to the end user.
Additional code size due to adding runtime checks and propagation of return values
Is there a way we can add flexibility on what runtime checks do? Consider the following:
We could define __ASSERT three ways, selectable by Kconfig:
1) The default case has __ASSERT() evaluate to the expression in its first argument, cast to a boolean. So if foo != bar it evaluates to false and we return -EINVAL. This is the robust configuration. The string could either be logged or discarded by the compiler.
2) The second case would be for constrained applications which will enable assertions in the debug phase, but otherwise not do return value propagation. __ASSERT() will work like it used to and immediately generate a fatal error =, printing the string, if the check fails.
3) The third case will be for constrained applications where code size must be optimized at all costs. Here __ASSERT() will simply be defined as 'true'. The compiler will see this and not compile anything at all, it will be as if it wasn't there.