Review - Githubissues

metamath / metamath-exe

Metamath program - source code for the Metamath executable

GNU General Public License v2.0

79 stars 25 forks source link

Review #40

Open wlammen opened 2 years ago

wlammen commented 2 years ago

I reviewed recently some code, two pull requests among them were quite big. We neither have code styles nor review check lists in operation. Hopefully we decide on the documentation rules soon, so at least in theory we can fill in these gaps.

Instead I followed the rules that are in effect in the company where I work. I think we should discuss the results here. This all is about learning. How can we improve the workflow? What level of quality (both in code and review) should we aim at? I am looking forward to hearing what you have got to say.

digama0 commented 2 years ago

I've been wondering how to approach this issue as well. Things will settle down eventually, once the code is a bit closer to locally optimal, but I expect to be putting in a large number of changes in the near future because the objective function has changed discontinuously and so we are currently far from that optimal point.

I don't want to be too forceful here in pushing my changes, but I think we should aim for higher throughput at the expense of review quality in the near term, at least until the big refactors are mostly done, set up CI and testing to ensure no obvious breakage, release a new version, and get some testing and bug reports from users in case of regressions.

Metamath has a lot of edge case behaviors, and I think we will need a big test suite to ensure compatibility with all of it (or at least the part of it we consider important). I think that good testing will be more useful in the long run than careful review (because it's a C project and eyeballs are only so good). Once we start to approach the optimum I think we can start to raise our documentation, testing, and review standards.

wlammen commented 2 years ago

Just one answer in 10 days...

For comparison: Here is a code review check list: https://github.com/mgreiler/code-review-checklist. Rest assured that I by far did not not use it exhaustively in my review attempts. They were a compromise between my limited knowledge about the topic at hand, my limited technical skills, my limited time, the perceived eagerness of developers to make perhaps long-awaited changes, and general rules governing software development.

Of course, the overall goals of the metamath-exe project determine the amount of scrutiny used in review. Like security, review rises costs, but may not yield any short-term advantages in return. Review is about quality. The advantages are often seen mid-term or long-term only.

IMO people voiced their expectation to have a rock-solid, reliable tool on hand. This puts the level of quality very high. My personal idea is to have a piece of software that is accessible to a wider audience. This puts a limit on the complexity and diversity of the software as well.

My hope was to extract from the answers here a draft review check list. So far this did not materialize.

digama0 commented 2 years ago

I personally don't think we need to systematize this much. Reviewing just means look over the code as best you can, point out odd things or design decisions, and then move on. I don't think we actually have enough manpower for an extensive review process, at least not without making progress on the codebase grind to a halt.

IMO people voiced their expectation to have a rock-solid, reliable tool on hand. This puts the level of quality very high. My personal idea is to have a piece of software that is accessible to a wider audience. This puts a limit on the complexity and diversity of the software as well.

metamath.exe is (currently) a ball of hacks. Just because it handles a formal system doesn't make it less hackish. Big changes need to happen to even make it possible to review to ensure robustness. We simply aren't ready yet for that level of review.

It seems like there isn't going to be much oversight in this repo, because of lack of interest. So I'm contemplating just merging my own PRs at this point: if no one wants to comment then I'll just continue to the next phase.

wlammen commented 2 years ago

metamath.exe is (currently) a ball of hacks. ... Big changes need to happen to even make it possible to review to ensure robustness.

Transition to a more professional style can imo be done systematically, in a steady (and measurable) progress. And if done properly, risk to break current semantics even in corner cases, can be reduced. Such a process begins (my experience) with the evaluation of the current state. Based on that you later define goals, i.e. functionality, styles, architecture, future directions and so on, and methods (bottom-up or reverse, contribution checks, tests, tools etc.) to achieve them. See my note in issue #40.

Starting the evaluation phase was the intent of my PR #41. Besides laying ground for a proper transition, if desired, it allows more people taking part in developing. Because the first hurdle, becoming expert enough to contribute sensibly, is overcome more easily. And the risk of nasty bugs is reduced to boot.

I wouldn't say that the current code is just a ball of hacks. I have seen worse in my life. A few parts look in fact quick and dirty, but in general I perceive an underlying concept as well.

In my eyes, not only the source code is subject to transition. The development process itself needs adjustment. The God-like eminence that decides what to do or not is gone, as well as his profound knowledge of where to expect pitfalls. We need experience and must learn to better judge on how to proceed, as a team.

The role of review: In retrospect, I would say, reviewing your code for example was not pointless. Portability issues were fixed, oversights and awkward style pointed out, possible misunderstandings explained. Since learning is imo an imminent issue, I summarize my findings in following rules:

Do only ONE thing at a time. Adding const to the interfaces for example is one task, where you can hardly make an error. This is supervised by the compiler. The reviewer only looks out for oversights as was the case with the len function. I can easily check hundreds of lines of code in short time. This part of your PR would have been merged days ago - if it wasn't mixed with a change where you strengthen the semantics by issuing a promise about the state of the stack. And that promise was premature. And blocked the obvious improvements.
If possible: break down your changes into small portions, even If you do one thing only.
Even when loosening review checks for whatever reasons one should agree on the range and extent of them in writing.

It seems we are for now the only one caring about a proper code base. While I think, in general, the discussion should be public, for quick exchange you can contact me per eMail <firstname>.<lastname> <at> <common Google account> as well.

wlammen commented 2 years ago

Addendum: Review lists like https://github.com/mgreiler/code-review-checklist are not restricted to peer-review. You can check your own contribution whether it matches these quality criteria. Obviously, assessing your own code may result in a biased and short-sighted view. Still, chances are, you sometimes correct shortcomings yourself before someone else is hit by them, or points them out.

jkingdon commented 2 years ago

Metamath has a lot of edge case behaviors, and I think we will need a big test suite to ensure compatibility with all of it (or at least the part of it we consider important). I think that good testing will be more useful in the long run than careful review

Totally agree with the value we'd get from automated tests. Any help you want with setting up automated tests? I don't want to overpromise how much time I'll have for any of this (given that proofs, apparently, are more fun for me than proof software), but if we can get a bit more concrete about how we might approach this (test frameworks, getting it set up in github, etc) maybe I could chip in.

I can make a separate issue if we want because it is a separate issue from code reviews (and there's nothing saying you can have both tests and code reviews).

digama0 commented 2 years ago

Here's a test structure that I think will work with metamath.exe: Each test consists of:

test.in
test.mm (optional)
test.expected (optional)
test.produced (build output, cleaned by make clean and .gitignored)

The test runner is a shell script ./run_test with the syntax

./run_test [--bless] test1[.in] test2[.in] ...

which works with xargs to run the entire test suite. (The .in suffix is optional and added if not present.)

The script will do the following for each test test.in:

assert test.in exists
If test.mm exists, then run ./metamath test.mm < test.in > test.produced, else ./metamath < test.in > test.produced
assert 0 exit code, unless test.in has a line ! expect_fail in which case we assert exit code 1
If test.in has a line ! run_test then we discard test.produced
else if --bless is provided, then copy test.produced to test.expected
else diff test.produced test.expected and print the results (and return with diff's exit code)

(This is inspired by the lean test suite.)

In short, we have two kinds of tests: run-tests (containing ! run_test) are just for checking that overall compilation works, and don't worry about the result, while "porcelain" tests check the exact content of the error message to ensure that it succeeds or fails in the right way. The --bless command is for producing the test.expected files: the idea is that it is run in CI without --bless so that when it fails you see a printout saying how the output did not come out as expected, and during development, if there has been a change to the display behavior, then you run all the affected tests with --bless to update the *.expected files (and the change becomes part of the PR, so people will review that the change was reasonable).

jkingdon commented 2 years ago

So I can't fully decide whether to weigh in more directly on the code review topic but I guess when I see threads like https://github.com/metamath/metamath-exe/pull/55#discussion_r796143316 I have trouble resisting the temptation to offer a third opinion to, hopefully, help us take a step back from the brink and all avoid getting into the mindset of "I am obviously right and why don't those people just understand?". I'm not sure I'm enough of a diplomat to improve things rather than fan the flames but I'll try:

First of all, at the risk of sounding utopian, I have been convinced by the past behavior of everyone on in this discussion that they all want what is best for this project and are motivated by a desire to promote that as they understand it.
We are starting from very different sets of assumptions on code reviews. I could summarize one option as "get out the fine toothed comb and look over each line systematically (that is, with a list of things to look for)" and the other extreme as "make a quick glance to see if it seems plausible and try to ensure safety via other means". I guess I've tipped my hand a bit in terms of my enthusiasm for the "other mean" of an automated test suite - and another "other mean" is more internals documentation (speaking of which it looks like we need a bit more review on #52 and #54 ), but at least speaking for myself I don't view any of this as absolute or either/or - I do regard it as valuable to have more than one set of eyes on the code (before a pull request is merged and/or after).
Until December 2021 we had a very different mechanism for resolving disputes - we all agreed to let Norm have the last word. This wasn't set in stone - anyone could fork the repository and some people did (at least if we count forks/branches without a large number of changes on them as "forks"). But for a variety of reasons, including Norm being willing to take on the task and having pretty good judgement in terms of listening to people and not always insisting on his own preference, our social contract was to defer to him when he made a call. We don't have that any more and so we should expect that it will take a while to feel our way on a number of topics - both technical but also on topics like tools and processes - which changed all of a sudden.
We're all a bit scared (each in our own way, perhaps) of the metamath.exe code base (even Norm with his "here be dragons" quote we ended up putting in CONTRIBUTING.md). I think this is part of why small disputes can sometimes blow out of proportion - we've probably all seen code bases fall to entropy and well meaning but ill thought out "improvements" and we depend a bit too much on metamath.exe for that to seem like a "oh, whatever" sort of situation. Sometimes this manifests as "don't change anything, the code works now", sometimes it manifests as "I have to clean this up NOW to make it maintainable, it is dangerous to let X sit around like a land mine". Sometimes it even manifests as "I better not let person Y mess up this code because I use it all the time and there isn't a good alternative to having it work well" (I know I have felt that although fortunately it didn't last long and I managed to talk myself down from that thought process and get back more into a "how can I help?" mindset).

I already said I might be the utopian one, but I do think we are figuring out this stuff, however imperfectly. While writing a CONTRIBUTING.md file (including some progress on documentation) and checking in a (very small at the moment) test suite might seem like small steps, to me they are evidence that even if we don't agree on everything, we are getting a bit closer to a development process which helps us make progress together.

digama0 commented 2 years ago

I just want to add that writing tests is an easy activity that anyone can get in on. The process is as follows:

Write a test.mm file, literally any sequence of characters is fine, bonus points for weird ones
Write a test.in file to say what you want to check about the mm file, usually something like ve p * or ve m * or sh p foobar
Run ./run_test.sh --bless test to generate the test.expected file, and check that it matches your expectation
repeat

We have 5 or 6 tests at the moment, each of which was only a couple seconds of work. I would like to get 100+ tests in there in the short term, so please use your imagination and write whatever you can.