oimeitei / quemb

QuEmb is an open-source tool for efficient quantum chemistry simulation of large molecules and solids (1D and 2D periodic systems) via bootstrap embedding technique.
Apache License 2.0
4 stars 6 forks source link

WIP: Run linter #49

Closed mcocdawc closed 3 weeks ago

mcocdawc commented 3 weeks ago

Mostly cosmetic:

Non-cosmetic:

mcocdawc commented 3 weeks ago

I don't want to start a bike shedding discussion, so I am also happy if the linter is configured to follow more closely the current way of formatting.

But note: we really should do linting as part of the testing; it already found some bugs. For example nsites here https://github.com/oimeitei/quemb/blob/main/kbe/chain.py#L103 is undefined and will crash the code.

(and in addition, the linting allowed to remove some unused variables and unused imports.)

mscho527 commented 3 weeks ago

Cool, I don't think we had a set style guide that we closely followed, so I think it would be useful to add one as the linters are enforced

mscho527 commented 3 weeks ago

One thing that I don't know a good answer -- should we use something like a pre-commit hook to enforce the linter in the future? Would you suggest something different?

oimeitei commented 3 weeks ago

I don't want to start a bike shedding discussion, so I am also happy if the linter is configured to follow more closely the current way of formatting.

But note: we really should do linting as part of the testing; it already found some bugs. For example nsites here https://github.com/oimeitei/quemb/blob/main/kbe/chain.py#L103 is undefined and will crash the code.

(and in addition, the linting allowed to remove some unused variables and unused imports.)

This was an old fragmentation code. Unless you want to do Hydrogen chain periodic system, this is basically useless. But, we should probably fix it or flush it out, so this is a good idea!

oimeitei commented 3 weeks ago

on the other hand, the huge file changes is probably not a good idea! something along the line of enforcing something for future PRs is a better idea I think.

mcocdawc commented 3 weeks ago

One thing that I don't know a good answer -- should we use something like a pre-commit hook to enforce the linter in the future? Would you suggest something different?

I would do it from the server side and enforce it for all PRs as part of the test suite. Otherwise we have to ensure the installation of ruff, flake or similar tools on the local side. Not that this would be impossible, we could easily do this in the setup.py, but then we install it also for non-developer users. One could even save some computation time and run the test suite conditionally, i.e. only if the static code analysis did not throw an error.

This leads over to Oinams point:

on the other hand, the huge file changes is probably not a good idea! something along the line of enforcing something for future PRs is a better idea I think.

I agree, that the amount of changed files is not nice and I will split the PR into pieces, but I think fixing all warnings at once is a bullet we should bite. Less for technical, but for psychological reasons. Enforcing compiler/linter/code analysis warnings is basically impossible for new commits if there are already too many. If there are already 405 warnings, you have to find the one 406th that your commit added and fix that one, why would you do that? While if there is a repo with 0 warnings (either because they are fixed, or false-positives are explicitly ignored) then a warning actually signals something and is not just ignored by a developer.

Nevertheless, I agree, at the moment it is too time-consuming to review my code if there are so many changes and only a few actually change semantics. My suggested splitting into four PRs would be:

Only the first two will involve large file changes, but they are also the least dangerous to break something. Does that make sense?

mscho527 commented 3 weeks ago

One thing that I don't know a good answer -- should we use something like a pre-commit hook to enforce the linter in the future? Would you suggest something different?

I would do it from the server side and enforce it for all PRs as part of the test suite. Otherwise we have to ensure the installation of ruff, flake or similar tools on the local side. Not that this would be impossible, we could easily do this in the setup.py, but then we install it also for non-developer users. One could even save some computation time and run the test suite conditionally, i.e. only if the static code analysis did not throw an error.

This leads over to Oinams point:

on the other hand, the huge file changes is probably not a good idea! something along the line of enforcing something for future PRs is a better idea I think.

I agree, that the amount of changed files is not nice and I will split the PR into pieces, but I think fixing all warnings at once is a bullet we should bite. Less for technical, but for psychological reasons. Enforcing compiler/linter/code analysis warnings is basically impossible for new commits if there are already too many. If there are already 405 warnings, you have to find the one 406th that your commit added and fix that one, why would you do that? While if there is a repo with 0 warnings (either because they are fixed, or false-positives are explicitly ignored) then a warning actually signals something and is not just ignored by a developer.

Nevertheless, I agree, at the moment it is too time-consuming to review my code if there are so many changes and only a few actually change semantics. My suggested splitting into four PRs would be:

  • purely cosmetic change to adhere to PEP8 (this will be a lot of files)
  • fixes of ruff that it can perform automatically
  • manual fixes of problems reported by ruff
  • manual changes (replacing reduce(np.dot with multidot(...)

Only the first two will involve large file changes, but they are also the least dangerous to break something. Does that make sense?

Ah, cool! To do this on the server-side, would adding something to the Github Actions so that github-actions[bot] commit applies the linter recommendations work? If this is the case, I guess the first two of the four PRs you suggested could actually be an addition of the relevant Github Actions routine, which would naturally apply the 'automatic fixes' at the same time.

mscho527 commented 3 weeks ago

To add a minor point, I think PR with a large # of file change is inevitable if we want to start using linters. It will just be a decision between one PR that touches on pretty much every file vs. multiple PRs over a longer period of time that each touch individual file.

mcocdawc commented 3 weeks ago

would adding something to the Github Actions so that github-actions[bot] commit applies the linter recommendations work?

One can certainly do this, but I would not let the server-side perform autoformatting. It should just emit diagnostics and reject new code that has warnings. If people then do autoformatting on their end or fix it manually is up to their disgression and how they set up their IDEs. Otherwise, any git push might introduce new commits and developers cannot assume anymore, that the code that makes it to the test-suite is actually the code they see locally.

In addition, there are enough sensible warnings where I don't want an automated tool to apply automated fixes. Let's say you had

y = f(x)

and got a warning that y is never used. It might be, that f has actually side-effects, so you don't want to just remove the line; as a human developer I would then check and refactor to.

f(x)    # f is called for its side-effects.

(Putting aside the discussion about evil-ness of side-effects. :-D)

PS: ruff and other tools distinguish between automatic fixes where it is sure that nothing breaks and potentially breaking fixes as the one for this example.

mcocdawc commented 3 weeks ago

The new list of PRs looks like this then:

I just ran the whole test-suite locally on the main branch and it actually has failing tests. (tests that are not part of the automatic GitHub actions) So before any major changes the current tests should actually work. That's why I added the first point. :-(

oimeitei commented 3 weeks ago

I would be in favor of holding off the cosmetic changes for now.

I think the whole codebase could potentially benefit from some restructuring first. For example, for the different solvers (specially selected CI branches, DMRG), it's better off passing in some dict or some data structure rather than individual settings. There should be many places that needs some improvements - that would make future development easier. In doing so, we can already practice writing beautiful codes :)

mcocdawc commented 3 weeks ago

I respectfully disagree. Coding while consciously ignoring linter warnings, will surely lead to more bugs in the future. In addition, enforcing PEP8-compliance only for future commits will result in a non-uniform code base and intermingle cosmetic with substantial changes and will make the review of these substantial changes much harder.

For example, for the different solvers (specially selected CI branches, DMRG), it's better off passing in some dict or some data structure rather than individual settings.

Isn't it possible to use dictionary unpacking in the argument list to fill keyword arguments, i.e.

settings = {'x': 1, 'y', 2}
f(**settings)
# is equivalent to
f(x=1, y=2)

already now?

mcocdawc commented 3 weeks ago

Just to give a concrete example where PEP8 compliance would probably have prevented a bug is https://github.com/oimeitei/quemb/pull/51 the original version was

C_ao_emb = C_ao_eo[np.newaxis]/(nkpts*(0.75))

and should have been a power operator ** instead of multiplication *.

Having whitespace around operators a recommended by PEP8

C_ao_emb = C_ao_eo[np.newaxis] / (nkpts * (0.75))

or having whitespace around less strongly binding binary operators as alternatively recommended

C_ao_emb = C_ao_eo[np.newaxis] / (nkpts**(0.75))
# vs
C_ao_emb = C_ao_eo[np.newaxis] / (nkpts * (0.75))

would have made it much easier to spot the wrong operator.

ShaunWeatherly commented 3 weeks ago

I respectfully disagree. Coding while consciously ignoring linter warnings, will surely lead to more bugs in the future. In addition, enforcing PEP8-compliance only for future commits will result in a non-uniform code base and intermingle cosmetic with substantial changes and will make the review of these substantial changes much harder.

For what it's worth, I'd have to agree with Oskar. The longer we wait the harder it will be to make the transition to a standard (PEP8 is near-universal at this point). After it's set up, checking compliance of future PRs takes no effort at all if we use a new flake8 workflow, for example. This would also make it easier to review PRs from people outside of the group (if this repo gets popular), so it's important to have it set up before people start making contributions.

mcocdawc commented 3 weeks ago

Closing this issue to not mix purely PEP8 formatting with semantic changes.

mcocdawc commented 3 weeks ago

54 outlines the steps for the refactoring the enable linting