metamath / metamath-exe

Metamath program - source code for the Metamath executable
GNU General Public License v2.0
77 stars 25 forks source link

fatal error handling, part 9 #111

Closed wlammen closed 1 year ago

wlammen commented 1 year ago

This is the design and implementation of an interface to mmfatl. The code in mmfatl is ONLY accessed through this interface. This allows for independent development on both sides: mmfatl and the rest of Metamath. As usual, regression tests validate the correctness of the implementation.

Each commit is constrained to a single idea, so a review may visit each commit individually for easiest verification. Using this interface for error reporting is a final step later done in a dedicated separate PR.

@digama0 From my point of view the implementation is finished and ready for use in the rest of Metamath

wlammen commented 1 year ago

@digama0 I don't see much headroom for improvements any more, perhaps apart from doc style. But that can be done on a bit-by-bit basis, and not just by me.

wlammen commented 1 year ago

@digama0 ready for review

wlammen commented 1 year ago

Rest assured I won't continue contributing beyond this task. It seems my frustration level matches yours in extent. The code proper in this PR is in essence the same as in #86 established almost 3 quarters of a year before. It is roughly worth a day's work. It addresses not only bugs like: using library functions where not allowed, relying on the heap where not available, returning an exit code with unclear semantics. Above all it corrects a misconception in the architecture of the software. If you look at the implementation of outOfMemory you can see that print2 is used to output the error message. print2 is a Swiss-knife like tool from the presentation layer with paging capabilities and many more facilities. It is near to impossible to guarantee it works under harsh conditions in the desired way. It is simply used inappropriately in the basic infrastructure level. I am completely flabbergasted to learn that mathematicians are not keen to address this issue in a timely fashion. This holds even more since this software claims to mechanically verify proofs, and you need trust from users for acceptance. As I said, the code was essentially written months before. The past three months were in my eyes solely about learning how to contribute. A test framework was developed to assure correctness, and an extensive Doxygen documentation generated with pre-/post-conditions. The latter can be used to prove the correctness of code, the former asserts the semantics of primitives. I wouldn't have done this in the first place for something like a less critical error path, I just was under the impression that I have to justify each line of code individually for creating some amount of confidence. I was completely at a loss to explain why pieces of 10 lines of code need a week to review. It seems I failed to arrive at a sufficient level of confidence. Guess what? If you don't like my implementation: Now you can simply rip it out and replace it with your favorite solution, without hurting callers (apart from introducing sloppy code, of course). May I finally hint to the fact that I am a professional? I did rework code before, both at work, and in private. I also contributed to Open Source software before. In a much more team-oriented and programmer's friendly environment.

There is still one PR pending, where I call the new fatal error handler from outOfMemory as an example, and I will leave it at that. Then we are done.

digama0 commented 1 year ago

Above all it corrects a misconception in the architecture of the software. If you look at the implementation of outOfMemory you can see that print2 is used to output the error message. print2 is a Swiss-knife like tool from the presentation layer with paging capabilities and many more facilities. It is near to impossible to guarantee it works under harsh conditions in the desired way. It is simply used inappropriately in the basic infrastructure level. I am completely flabbergasted to learn that mathematicians are not keen to address this issue in a timely fashion.

First of all, mathematicians are not the ones maintaining the software. Mathematicians want to use a tool and have it do useful things. Architecture design stuff doesn't come into it at all, except insofar as it prevents the delivery of useful features. That is, they are the end-users.

I agree with you that print2 is overengineered and it should not be called reentrantly. Yet, you did not even touch print2 here, or really any of the actual metamath code. Moreover you could have solved this problem by simply calling printf instead of print2, that would have been a trivial fix which solves a demonstrable bug; instead we have a whole file full of stuff that is not even being used to solve the problem. The motivation for mmfatl seems to be out-of-memory situations in some kind of antagonistic situation and without an actual reproduction I just don't trust that it is handled correctly regardless of how much untested code is written.

This holds even more since this software claims to mechanically verify proofs, and you need trust from users for acceptance. As I said, the code was essentially written months before. The past three months were in my eyes solely about learning how to contribute. A test framework was developed to assure correctness, and an extensive Doxygen documentation generated with pre-/post-conditions. The latter can be used to prove the correctness of code, the former asserts the semantics of primitives.

I think the documentation is good, and if you were to spend all this effort on something more worthwhile that would be really useful. There is a ton of sketchy code in the codebase that needs explanation, cleanup and documentation, and I really wish you could document that instead of trying to formally prove a completely new module using comments in a C program.

If there is one thing I have learned regarding formal verification, it is that it is a lot of work and it is very important to direct verification work in the right direction, both so that you don't waste too much time and also so that it adds value to the project. FV is not an unmitigated good thing: a formally proved piece of software is less flexible, because changes to the requirements require more work to change the proofs to match, than if the software was not proved correct.

It is also certainly not always the case that a program that does formal verification is itself formally verified. A formally verified verifier is a huge project, and most verifiers are just regular software with the usual standards of correctness. A formally verified program needs to be designed from the ground up to support that goal, because it affects so many things about how you do it. For an inherited project like this I really don't see it as a goal - it would be more effective to just rewrite the program from scratch.

I wouldn't have done this in the first place for something like a less critical error path, I just was under the impression that I have to justify each line of code individually for creating some amount of confidence. I was completely at a loss to explain why pieces of 10 lines of code need a week to review. It seems I failed to arrive at a sufficient level of confidence.

It is also possible to add too much documentation, you know. The documentation is what usually gets most of the review, and it's making many assertions that have to be checked. Beyond that, this is a low priority for me and my motivation to review more fatal error handling stuff is very low, hence my plea for you to work on a more interesting part of the code or something with a more direct impact on the user experience.

Guess what? If you don't like my implementation: Now you can simply rip it out and replace it with your favorite solution, without hurting callers (apart from introducing sloppy code, of course).

Perhaps, but seeing as there wasn't even a bug report filed for this I would have just left it as is. But also, I'm not actually trying to burn bridges here, I'm just very burned out over this.

May I finally hint to the fact that I am a professional? I did rework code before, both at work, and in private. I also contributed to Open Source software before. In a much more team-oriented and programmer's friendly environment.

I can tell. The code reads like a culture clash between a hobby project and an enterprise programmer. I don't doubt your expertise, but I think it is being misapplied in this circumstance.

There is still one PR pending, where I call the new fatal error handler from outOfMemory as an example, and I will leave it at that.

Yes please, don't leave this as dead code. That should have been in the first PR, or at least the first one where the error handler is functional.

Rest assured I won't continue contributing beyond this task. It seems my frustration level matches yours in extent.

I hope it's not too late to convince you to continue, maybe after a break for both of us. I really am not trying to drive you away.

digama0 commented 1 year ago

I wouldn't have done this in the first place for something like a less critical error path, I just was under the impression that I have to justify each line of code individually for creating some amount of confidence. I was completely at a loss to explain why pieces of 10 lines of code need a week to review. It seems I failed to arrive at a sufficient level of confidence.

By the way, it occurs to me that we may have had a miscommunication here. I am not confident in the code, but that isn't a request for more words explaining it and it's also not a blocker for merging the code. When I am reviewing for this project, what I am looking for is that the changes either have a positive effect on the user experience or make the code more maintainable. As such, lots of documentation about a module which is dead code is the exact opposite of what I want to see, since it's not improving the user experience and it's not making the existing code any easier to maintain, and is adding an additional maintenance burden for a new module. Again, it's not going to block a merge, since perhaps this is a necessary step on the way to something better, but that's where my priorities lie and you can use that to judge what kind of code will be easy to review.

wlammen commented 1 year ago

Moreover you could have solved this problem by simply calling printf instead of print2, that would have been a trivial fix which solves a demonstrable bug;

No, this is an invalid solution. Reentrance is just one problem. https://www.gnu.org/software/libc/manual/pdf/libc.pdf page 293 marks printf as AS-Unsafe heap. And on page 5 of the same document we learn "Functions marked with heap may call heap memory management functions from the malloc/free family of functions and are only as safe as those functions." So you MUST NOT call printf in an out-of-memory context. Hence you unfortunately need a heap safe version of basic string functions like printf or concat. They are implemented in mmfatl, and the costs are in the order of a 100 code lines. I can read such an amount of code without difficulty, but for those not we have the test code, where these functions are called with various arguments. Luckily the implementation is now behind an interface. The pain of verification is just done once, no one need to learn the gory details to use he interface.

Yet, you did not even touch print2 here, or really any of the actual metamath code.

I (will) do, see for example https://github.com/metamath/metamath-exe/pull/86/files line 729ff of mmdata.c

If there is one thing I have learned regarding formal verification...

If you look at other PR like #9 you can see the difficulties you have WITHOUT a formal description. I wouldn't go as far as a formal verification, instead something that allows you to verify code in a way math proofs in books are read. There you go through each proof step and mentally check that the prerequisites are met. In a formal description pre/post-conditions serve this role. Some coding standards are in order to force any newly added code to meet this requirement.

It is also possible to add too much documentation, you know. The documentation is what usually gets most of the review, and it's making many assertions that have to be checked.

Unfortunately this needs to be done, see the paragraph before. This is essential to the maintainability of the code.

...and if you were to spend all this effort on something more worthwhile that would be really useful.

I did. All doxygen style comments are from me. See for example print2. I gave up after I could not establish any invariants any more in some reentrant cases. This series of PR were motivated by this experience.

Like it or not: You have to get the basics right first, before you turn your attention to higher levels.

I hope it's not too late to convince you to continue, maybe after a break for both of us.

OK

digama0 commented 1 year ago

Moreover you could have solved this problem by simply calling printf instead of print2, that would have been a trivial fix which solves a demonstrable bug;

No, this is an invalid solution. Reentrance is just one problem. https://www.gnu.org/software/libc/manual/pdf/libc.pdf page 293 marks printf as AS-Unsafe heap.

Yes, I'm aware there are obscure edge cases on obscure environments where it might plausibly fail with that implementation. It is a tradeoff between a bunch of new code and a simple good-enough fix. I would rather wait until we can actually prove by counterexample that this situation is reachable before taking on the cost of a reimplementation.

If there is one thing I have learned regarding formal verification...

If you look at other PR like #9 you can see the difficulties you have WITHOUT a formal description.

The point I am making in the last post is that this is a misunderstanding: it is not the lack of formal description that makes this code undesirable for me, it is the lack of testing (not testing of intended behavior, testing for the edge cases that this implementation purportedly has over printf) and lack of a reproduction or bug report that the PR fixes. (We don't even have a demonstration of print2 reentrancy, which I expect is much easier to accomplish than getting printf to misbehave.)

I wouldn't go as far as a formal verification, instead something that allows you to verify code in a way math proofs in books are read. There you go through each proof step and mentally check that the prerequisites are met. In a formal description pre/post-conditions serve this role. Some coding standards are in order to force any newly added code to meet this requirement.

This is what I mean by doing formal verification through comments. I am broadly speaking in favor of this approach, but I think it needs to be done at a higher level than you are doing currently. Good mathematical papers leave comfortably sized gaps between lines of proof, because the reader isn't an idiot or a computer. If the code is readable, then it can serve as its own specification, and you only need to intercede when invariants are not easily inferred from local information. With the level of detail you are using for these 4 line functions, we'll never have a hope of documenting all of metamath.exe with the paltry workforce we have available.

...and if you were to spend all this effort on something more worthwhile that would be really useful.

I did. All doxygen style comments are from me. See for example print2.

Thank you for that.

I gave up after I could not establish any invariants any more in some reentrant cases. This series of PR were motivated by this experience.

Documenting the invariants like this is not really the first priority when it comes to documentation. What does the function do? Not a blow by blow of what each line does, a high level description of what its purpose is within the whole. Invariants can be documented on the fly as you discover them by reading related functions. Documenting a little bit of many functions is vastly more useful than describing in minute detail the behavior of a single function with a simple implementation.

Also, because many metamath functions are long it also helps to document inside functions, since for now the comments are mostly intended for people who are reading the code to understand it, not people writing new behavior and calling the external API (there isn't even really an external API in the first place).

wlammen commented 1 year ago

I would rather wait until we can actually prove by counterexample that this situation is reachable before taking on the cost of a reimplementation.

I (and the colleges in my company) follow quite a different course. We remove detected bugs in our software, without exception, and this has a maturing effect by the statistics we run. This effort may occasionally be in vain, because particular bugs would never surface for logical or statistical (once in a lifetime event) reasons. But wild guesses about their frequency only affect the priority in which they get fixed. In our case here nothing is obscure. GNU openly points out in the docs that their printf may use malloc, and is therefore subject to undefined behavior should the heap be exhausted. The behavior stays undefined, even if a particular (or most) call succeeds. And even if the current libc rarely (or never) relies on the heap, GNU reserves the freedom to change this behavior with any upgrade. The contract simply says: make sure you have enough memory available when you call printf. What could be alternatives to my code then? You may query the heap state first, and depending on the result either use printf, or fall back to my solution. Or dismiss the error message completely, hoping this occurs seldom enough to be of no significance. I am not in favor of any of them, but, as I said, feel free to replace my solution with yours later. Or let the audience decide. Or wait for iterations.

Regarding the documentation:

We are in an Open Source context, that puts some limitations on what is possible. It takes years of experience with a software of Metamath size to get familiar with all the intricacies of the implementation. You cannot expect an average visitor to delve into details without the help of documentation. Or if they are willing to do so, that they arrive at a thorough understanding, and realize all the side effects a program modification ensues. If you are interested in a vivid community, then the current software needs a transition from a Norm centered to a more community friendly form. This covers documentation, implementation and testing. This PR is a pilot showing possible ways to achieve these goals. So an evaluation should point out in what direction to head. What are sensible goals? Can they be achieved with the measures presented? You voiced that documentation is a burden in the review step, and hence should be reduced. To some extent this is possible, but where are the limitations, given its importance for, say, an uninitiated newbie? Such questions should be discussed in a form that accepts trials as part of the learning curve.

digama0 commented 1 year ago

I would rather wait until we can actually prove by counterexample that this situation is reachable before taking on the cost of a reimplementation.

I (and the colleges in my company) follow quite a different course. We remove detected bugs in our software, without exception, and this has a maturing effect by the statistics we run. This effort may occasionally be in vain, because particular bugs would never surface for logical or statistical (once in a lifetime event) reasons. But wild guesses about their frequency only affect the priority in which they get fixed.

That's a nice approach, for your project. I'm sure it can help make for good code. But I don't think it's appropriate for metamath-exe because of its current status as a legacy codebase. The thing is, metamath-exe is chock full of bugs if think logically about the code and invariants. This is plain to see if you read enough of it, there are tons of edge cases that are not handled correctly, and if we are forced to fix all of them as soon as they are detected we'll be here forever and we would need to almost completely rewrite the program, because many of the bugs are architectural. I would much rather both of us spend that time and effort on one of the next-generation metamath verifiers instead, like metamath-knife. I would be much more receptive to that style applied there, where the program is already pretty close to having no known bugs.

Because metamath-exe is full of bugs, for maintenance purposes we need to prioritize, starting with getting a good understanding of the high level structure of the invariants and working our way down, fixing localized bugs when we can.

We have an issue tracker that you can use to document known bugs, including ones you write deliberately. Please use it.

Also: Just to be clear I would actually be in support of a from-scratch reimplementation of metamath-exe. I think a lot of good could come of it. I have some fairly promising results regarding using metamath-knife to do web page generation, and we can work on the command line tool as well if people like the MM-PA interface. But it won't be happening on this repo.

In our case here nothing is obscure. GNU openly points out in the docs that their printf may use malloc, and is therefore subject to undefined behavior should the heap be exhausted. The behavior stays undefined, even if a particular (or most) call succeeds. And even if the current libc rarely (or never) relies on the heap, GNU reserves the freedom to change this behavior with any upgrade. The contract simply says: make sure you have enough memory available when you call printf.

So maybe we should do something to avoid running out of memory then? If we detected memory exhaustion before it got dire then this wouldn't be an issue. I'm generally suspicious of out of memory handling because it is completely untested, and if we're going to think about that level then I have plenty of other concerns: what if we can't call your function because we ran out of stack space? Can we guarantee that libc functions remain safe to call? What if the arithmetic overflows? When I turn on paranoia=full mode the whole C language seems like a lost cause, but I try not to talk about that in polite company - there is a time and a place, and maintaining a legacy codebase is not the place for applying NASA-level software standards.

What could be alternatives to my code then? You may query the heap state first, and depending on the result either use printf, or fall back to my solution. Or dismiss the error message completely, hoping this occurs seldom enough to be of no significance. I am not in favor of any of them, but, as I said, feel free to replace my solution with yours later. Or let the audience decide. Or wait for iterations.

What I would do is to call printf unconditionally, and add as a precondition that there has to be at least X much heap space available, and take precautions to ensure that X much heap space is always available by setting the heap size at the top level. Or not - it can just be a precondition we pass to the user: this program requires at least X much heap space to run. Of course we will probably be unable to determine what X is because no C library documents things at that level, but, well, we can tell the user that there was some X for which it is true and if the program crashes then try adding memory. (This sort of thing is why I think the C language is a lost cause.)

Regarding the documentation:

We are in an Open Source context, that puts some limitations on what is possible. It takes years of experience with a software of Metamath size to get familiar with all the intricacies of the implementation. You cannot expect an average visitor to delve into details without the help of documentation. Or if they are willing to do so, that they arrive at a thorough understanding, and realize all the side effects a program modification ensues.

I'm not sure where I suggested otherwise. I said that you should ensure that all information necessary to understand the code is available locally, by combining pre/post information from the docs with the lines of code themselves.

If you are interested in a vivid community, then the current software needs a transition from a Norm centered to a more community friendly form. This covers documentation, implementation and testing. This PR is a pilot showing possible ways to achieve these goals. So an evaluation should point out in what direction to head. What are sensible goals? Can they be achieved with the measures presented? You voiced that documentation is a burden in the review step, and hence should be reduced.

I'm not saying documentation should be reduced, I'm saying it should be applied more evenly across the codebase. I want all functions to have about the same level of documentation, so if you go full NASA on one function then this leaves an unreasonable demand on all the other functions to match that level (especially considering that most metamath functions have much more complicated interactions with the program state than mmfatl does). If the program was already documented at this level of detail, it would not look so out of place to do so for a new module as well.

To some extent this is possible, but where are the limitations, given its importance for, say, an uninitiated newbie? Such questions should be discussed in a form that accepts trials as part of the learning curve.

I think this PR has been useful as a trial. I think it was not a success though. If I were to give examples of newbie PRs, they would be:

What I think should be avoided:

Adding excessive documentation is somewhat neutral here; the main issue is that documentation must be correct so if you write a lot then a lot has to be checked, so it is often better just not to say anything if you can't speak authoritatively about it. So documentation-heavy PRs tend to require a lot of review.

wlammen commented 1 year ago

I think this PR has been useful as a trial. I think it was not a success though.

I am a bit tired of learning over and over again that my contribution is "pure crap". This is insane.

I tried hard to deliver a highly robust error handler. The hell might have broken loose elsewhere in Metamath, the handler does what it was designed to do and prints out a submitted diagnostic message. This is because it is what I call "self-contained". It does not rely on the program state whatsoever and uses just minimal system resources: It relies only on the operating sytem still capable of forwarding 1 KB of raw text to the terminal via fputs, and it uses to this end, I guess, 300 Bytes of stack memory, of which the handler consumes about 30 bytes itself, the rest is an estimate for fputs. Nothing more. It is very hard to blow this piece of software. Do we really need such fortified code? Hopefully never, but IMO it as much indispensable as safety belts are in cars.

This all comes at a price of 160 SLOC (source lines of code), well tested and documented, shielded behind an interface in a separately maintainable sub-system. I wish I had this level of confidence and robustness elsewhere in programs, especially after entering an error state. Obviously I value software by a completely different metric than you do.

digama0 commented 1 year ago

Obviously I value software by a completely different metric than you do.

Yes, this is what I am trying to explain. Your code is not bad at all, it is optimized for a different metric than the one I am using for review. My list of bullet points in the last post was to give explicitly the things I am looking for. In a different context I would be quite happy with this kind of code, but it's not a good fit for metamath-exe in its current state. I'm not going to revert the stuff that has been merged so far, and if you want to continue to improve it that's fine (I'm not sure if more is needed but it should be properly integrated with the rest of the system), but I hope you have enough flexibility to be able to "write for a different company" as it were.

I'm hoping that future PRs will go more smoothly - I think there is plenty of space for contributions in the indicated direction, and #17 has a huge list of todo items, all of which are pre-approved for merging. If you have a different idea from anything I have mentioned so far, if you ask I can let you know what I will be looking for and give suggestions along those lines.