Error checks for malloc, calloc and write calls

remvee commented 2 years ago

Some error checking added on stdlib and stdio calls.

rdoeffinger commented 2 years ago

I don't think such blanket checks are a good issue. More specific issues: replacing calloc with malloc is not correct, these are not the same. If anything, calloc should be used everywhere. Using size_t means behaviour is (more) architecture dependent. Architecture-dependent behaviour makes checking and testing the code harder. Ensuring the calling sites are restricted to int is better, though could also use a 64-bit argument and check the value fits in 32-bit as an alternative if the former is too tricky. The "safe" alloc wrappers are all buggy, a return value of NULL only indicates an error if the size is not 0. Since they are duplicated, such bugs are in multiple places, that is not good practice. Some of this code is in library code, where just calling exit() is not at all ok. These should likely call abort() instead - though whether there is any value over just not checking at all and letting it crash with a NULL pointer dereference is a question (whether any of those cases would be exploitable or not may or may not be a relevant question). The unchecked writes are not good, however just blindly adding a check and calling exit is not a good solution. Since it's for PNG, a user concerned about robustness luckily can easily verify the files are ok. A more proper solution IMO would be to generate the data wholly in memory and have a function return that. This can then in case of error properly return NULL. Which would leave a single write() call afterwards, which can then be properly checked and allows for more advanced and proper handling of the error case (for interactive use, asking the user if we should try again for example, otherwise deleting the incomplete/corrupted file etc).

remvee commented 2 years ago

I don't think such blanket checks are a good issue.

I took https://github.com/rdoeffinger/iec16022/blob/main/iec16022ecc200.c#L74 as an example and must admit my C-programming skills are very much lacking.

More specific issues: replacing calloc with malloc is not correct, these are not the same. If anything, calloc should be used everywhere.

I get that but I don't known what to do about things like https://github.com/rdoeffinger/iec16022/blob/main/image.c#L158 What has a size 1? sizeof(byte) maybe?

Using size_t means behaviour is (more) architecture dependent. Architecture-dependent behaviour makes checking and testing the code harder. Ensuring the calling sites are restricted to int is better, though could also use a 64-bit argument and check the value fits in 32-bit as an alternative if the former is too tricky.

So that's unsigned int for size_t and int for ssize_t and that would work most of the time? I only have access to Linux and OpenBSD environments to test, which both support size_t and friends.

The "safe" alloc wrappers are all buggy, a return value of NULL only indicates an error if the size is not 0. Since they are duplicated, such bugs are in multiple places, that is not good practice.

I did not realize this and I'm wondering what happens when you free(..) a NULL pointer, that just works?

Some of this code is in library code, where just calling exit() is not at all ok. These should likely call abort() instead - though whether there is any value over just not checking at all and letting it crash with a NULL pointer dereference is a question (whether any of those cases would be exploitable or not may or may not be a relevant question).

See comment about taking example from iec16022ecc200.c.

The unchecked writes are not good, however just blindly adding a check and calling exit is not a good solution. Since it's for PNG, a user concerned about robustness luckily can easily verify the files are ok. A more proper solution IMO would be to generate the data wholly in memory and have a function return that. This can then in case of error properly return NULL.

That's way out of my depth at this moment but I may look into that at some later point in time.

Which would leave a single write() call afterwards, which can then be properly checked and allows for more advanced and proper handling of the error case (for interactive use, asking the user if we should try again for example, otherwise deleting the incomplete/corrupted file etc).

Not sure interactive use should be supported here as it brings in more unwanted architecture dependencies (like checking if stdin is a tty for instance). The inability to write should just result in a failure of some kind, IMO.

Anyway.. I really appreciate your feedback and thank you for maintaining this package. Please close this PR if you wish, I'll open an issue for the unchecked [mc]alloc and write calls instead if you do and maybe have another stab at fixing these issues when my C-programming skills have improved.

Oh, and feedback on the remarks/questions in this comment are also welcome! 😉

rdoeffinger commented 2 years ago

I took https://github.com/rdoeffinger/iec16022/blob/main/iec16022ecc200.c#L74 as an example and must admit my C-programming skills are very much lacking.

Yeah, there is some existing code that I think is not good. But for now I am primarily maintaining the code, i.e. fixing only things that actually cause issues. Admittedly whether to check malloc failures at all is a somewhat controversial topic.

I get that but I don't known what to do about things like https://github.com/rdoeffinger/iec16022/blob/main/image.c#L158 What has a size 1? sizeof(byte) maybe?

If you mean "1" (as in neutral element of multiplication), just use 1. The arguments to calloc have no special meaning, you do not need to use sizeof. That said, the standard specifies that sizeof(char) is 1, but using it in this case would just be obfuscation.

So that's unsigned int for size_t and int for ssize_t and that would work most of the time? I only have access to Linux and OpenBSD environments to test, which both support size_t and friends.

It's not about supporting them. The problem is that size_t will have a maximum value of 2^32-1 for 32-bit machines and 2^64-1 for 64-bit machines. Which means if you want to ensure no overflows you have now 2 different limits to check in all your testing. And I would generally use "int" for size_t replacement as well, not just ssize_t. For programs like this, you absolutely never will have a need for allocations > 2GB, and a signed type leaves you the negative values to do extra sanity checks or error reporting with (while it is incorrect and invokes undefined behaviour, just adding a check that the size is not < 0 is a way to detect some integer overflows for example and can make sense to do in some cases - even if as said it is incorrect to rely on that).

I did not realize this and I'm wondering what happens when you free(..) a NULL pointer, that just works?

Check "man 3 free". It says explicitly it does nothing is the argument is NULL.

The unchecked writes are not good, however just blindly adding a check and calling exit is not a good solution. Since it's for PNG, a user concerned about robustness luckily can easily verify the files are ok. A more proper solution IMO would be to generate the data wholly in memory and have a function return that. This can then in case of error properly return NULL. That's way out of my depth at this moment but I may look into that at some later point in time.

No worries, there's a reason I've not done it myself, it's a bit much work without a concrete use-case showing its importance.

Not sure interactive use should be supported here as it brings in more unwanted architecture dependencies (like checking if stdin is a tty for instance). The inability to write should just result in a failure of some kind, IMO.

Yeah, I was just starting from the gut feeling that there likely is a more correct way to handle failures than exit (because there almost always is, which is why doing error handling right is so much effort, and doing it in a minimal effort way does not get good results). But deleting the partial file to make sure a user not checking the exit code or messages notices something went wrong is very much a good idea and something that ideally would be done. (admittedly it could be achieved by creating a temporary file first and only at the end renaming to the real output name, which has other advantages as well, but temporary files have a lot of platform-specific behaviour)

I'll open an issue for the unchecked [mc]alloc and write calls instead if you do and maybe have another stab at fixing these issues when my C-programming skills have improved.

I agree on the unchecked writes. For the unchecked [mc]alloc I am more tempted towards removing the existing checks, since I am not convinced checking won't always do more harm than good for those.

remvee commented 2 years ago

For the unchecked [mc]alloc I am more tempted towards removing the existing checks, since I am not convinced checking won't always do more harm than good for those.

Won't a ENOMEM cause a segfault then? Isn't attempting an error message and abort() nicer? If it gets through of course.. But removing the existing safemalloc would be a good change because it sets the wrong example. 🙂

I'll make an issue for the unchecked writes; #18

rdoeffinger commented 2 years ago

Won't a ENOMEM cause a segfault then?

Sure.

Isn't attempting an error message and abort() nicer?

In theory. Except for all the detail. As you hinted at though, if any printing even works if you were actually out of memory is questionable. But more importantly, in practice malloc never fails because it is actually out of memory. When a system actually runs out of memory, it's the OOM killer that will handle it. The few people running with different settings can be expected to know what they are doing and recognize the sign of random crashes. But a small program like this even in weird setups would not have malloc failing due to out of memory in any sensible configuration. So what are you going to say in that error message? Because if you say out of memory, you will only be completely misleading the user. So the question is not "error message or not" but "at least useless, possibly misleading error message or not". Much less convincing already. Most malloc failures will be due to virtual address space exhaustion, or bad allocation size. Which in a program like this would be clearly a massive bug somewhere in the code, which just like a SIGSEGV would need someone to properly debug it. So the message gives at best minimal, possibly negative value. But then you also have the code to check and produce it. Which adds extra complexity and makes your code non-standard. And it's extra code that can have bugs. And lastly: almost nobody does sensible malloc checks. So fixing this one program will not do much. If some user wants to "fix" this, getting aborts on malloc failure, there is a much easier and better solution available: Change glibc or make a LD_PRELOAD that disallows malloc failing. It will be a change in a single peace of code, it can be reviewed to death to make sure it is absolutely correct and it fixes all programs in the world at once. For the few programs that managed to do something better for malloc failure (I doubt any exist to be honest), they can get an exception. That is the argument against checking for malloc failure. For an argument of the other side you'll need to find someone else, I know they exist :)

remvee commented 2 years ago

Thx!

emixa-d commented 2 years ago

But more importantly, in practice malloc never fails because it is actually out of memory. When a system actually runs out of memory, it's the OOM killer that will handle it. The few people running with different settings can be expected to know what they are doing and recognize the sign of random crashes.

IIRC, this is not the case on the Hurd, which Guix aims to support. Also, you can't rely on NULL just causing segfaults -- dereferencing NULL is undefined behaviour, so the compiler can assume when it is dereferenced that it isn't NULL and optimise based on that.

But a small program like this even in weird setups would not have malloc failing due to out of memory in any sensible configuration.

This program doesn't use much memory, but you can have another program running at the same time consuming much memory, possibly causing a malloc failure in iec16022.

And lastly: almost nobody does sensible malloc checks. So fixing this one program will not do much.

If everyone does nothing because other people are doing nothing too, then we never get anything done. A small improvement is still an improvement, and with effort, lots of small improvements form a big improvement. And when reviewing source code of packages for Guix, I check for these kind of issues.

So what are you going to say in that error message? Because if you say out of memory, you will only be completely misleading the user. So the question is not "error message or not" but "at least useless, possibly misleading error message or not". Much less convincing already.

Proposal: ‘failed to allocate memory’? Then the user would know it's memory related -- segfaults can e.g. be caused by out-of-bounds reads as well, not only mere out of memory. (Or no message and only an abort.)

remvee commented 2 years ago

For the unchecked [mc]alloc I am more tempted towards removing the existing checks, since I am not convinced checking won't always do more harm than good for those.

Won't a ENOMEM cause a segfault then? Isn't attempting an error message and abort() nicer? If it gets through of course.. But removing the existing safemalloc would be a good change because it sets the wrong example. 🙂

I'll make an issue for the unchecked writes.

emixa-d commented 2 years ago

I'm not quite following what the positions are anymore, but 'abort on allocation failure' sounds good to me.

However, writing error messages from a library can be potentially problematic (e.g. if the program was using fd 2 for their own error messages in some strict format and doesn't expect libraries to write their own things there, though that could perhaps be resolved by just noting in the README that this can happen (treating such expectations as unsupported).

rdoeffinger / iec16022

Error checks for malloc, calloc and write calls #17