Offering Help: Modernised Python Syntax?

exhuma commented 2 years ago

I recently hacked coverage to scratch my own itch and noticed that the internal code could be internally improved with newer Python syntax. The primary thing I noticed (because it was the thing I was hacking on) was the usage of simple tuples in summary.py. Those could be replaced by either namedtuple or even typing.NamedTuple. Maybe there are places where dataclasses could become useful too.

I know this will break compatibility with older Python versions, depending on which features are chosen. With Python 3.6 being EoL since December there is an argument to be made (albeit a weak one) that newer releases of coverage could require 3.7+ which would certainly open up a lot of interesting possibilities.

Type hinting would be an option as well with 3.6+

This does of course raise the issue of code-churn which does not change the functionality of the code and risking regressions with the added issue of messing up git blame to some extent. But I believe a modern syntax helps overall project readability and consequently maintainability as well.

I have been writing Python semi-professionally since the 2.4 release (don't remember the date) and have been writing Python professionally since 2011.

@nedbat: If you are interested in this let me know what kind of things you would like to see and we can agree on the details. If you judge the added-value of this to be too small, or prefer to remain 2.7+ compatible (not sure this is even the case atm) to support older systems, feel free to close this issue.

nedbat commented 2 years ago

Hi, thanks for checking in!

I'm curious what your own itch was? :)

I typically don't change coverage just to bring the code into more current idioms. For example, I still use optparse, which has been superceded a few times over now. I'm not opposed to using newer styles, but it's more interesting to me if we can identify a clear benefit. Named tuples might be a benefit.

I've already moved master in the repo off of 3.6, it only support 3.7+ now.

Of course, if you want to help in other ways, I have ideas :)

exhuma commented 2 years ago

I understand your reasoning for keeping old idioms in place. And I'm not going to touch what you don't want me to touch 😉

As you mention named-tuple, they are indeed a fairly low-hanging fruit. And, related: dataclasses (considering they are available in 3.7+).

How is your stance on type-hints? Adding more precise types like named-tuples/dataclasses opens up useful typing options.

NB: I generally look at the namedtuple/dataclass difference as follows:

If the object can be identified by the sum of its arguments (i.e.: If the object identity changes if any of its attributes changes) it's a NamedTuple. For example a point in 2D space
If the object can change any attribute without changing the identity, or has one or several attributes dedicated to serve as identity, it's a dataclass.

With that in mind, I generally end up using dataclasses way more often nowadays. They almost completely took over my usage of named-tuples.

I also generally flag dataclasses as "frozen", which has some implications on the attribute types though (f.ex. a frozen dataclass with a mutable-mapping as attribute).

exhuma commented 2 years ago

I'm curious what your own itch was? :)

... forgot to answer this point.

It was actually something that is already supported by coverage but I completely missed it when running --help. That something was to exclude fully covered files in the terminal report.

It would also be nice to have a good decoupling between data-model and output "renderers". It would make it easier to support other output formats or maintain the existing ones. For example, I recently started getting into unit-testing with code-coverage in JS and came across istanbul and really liked how their console output looks.

ssbarnea commented 2 years ago

To be honest, I do think that modernizing the code should be an ongoing process and I find @exhuma proposal as welcomed. That is usually a boring and time consuming activity. Still, that does not make it useless at all.

I observed few things that are clearly outdated about the project:

it does not have type information (py.typed)
it did not ditch setup.py, aka PEP-621 - just few minutes ago I was happy to welcome a contribution about that on doc8.
optparse

Shortly, I find it smart to ask before trying to implement such changes. I made the mistake of trying to help a project without asking first and I was disappointed. Usually is ok, especially when I personally know the owner (preferences), but when I do not, I prefer to ask.

nedbat commented 2 years ago

I observed few things that are clearly outdated about the project:

it does not have type information (py.typed)

it did not ditch setup.py, aka PEP-621 - just few minutes ago I was happy to welcome a contribution about that on doc8.

optparse

For type information, there is a partial pull request: https://github.com/nedbat/coveragepy/pull/1442. Maybe people can join forces to finish it.
For setup.py, I am one of the packaging-confused that doesn't understand all of the forces at work, or how to be "modern." My setup.py does quite a lot. I am not opposed to changing it, but will need help.
For optparse: honestly, what difference does it make? optparse works. What is the actual benefit of switching to argparse?

Kludex commented 2 years ago

For type information, there is a partial pull request: https://github.com/nedbat/coveragepy/pull/1442. Maybe people can join forces to finish it.

Do you want it to be in a single PR? What I did on uvicorn was progressively add type annotation: https://github.com/encode/uvicorn/blob/c21e5779c15eaaf6604834c1c1a3b9107ffb4fb9/setup.cfg#L5-L55

ssbarnea commented 2 years ago

@Kludex My personal take was that is better to make it gradual. I had few projects where adding type took 10+ changes over a couple of months, just to minimize the chance of conflict with other work.

exhuma commented 2 years ago

I also believe that adding type-hints gradually is less painful.

In any case, I'd like to chime in, considering that I am the author of the issue 😉

I understand @nedbat with the caution about changing code just for the sake of "modernising". I myself prefer to keep something unmodified while it's working lest I accidentally introduce unexpected bugs/side-effects.

However, "modernising" the code-base can have a positive impact on maintainability by making use of newer language-/syntax-features. Of course, if a change neither improves maintainability nor features, then it adds no value and should not be done. And even if it adds value, the compromise between "added risk of bugs/side-effects" and "improved maintainability" must be taken into account.

For the above reasons, the decision of green-lighting such a change is not as straight-forward as green-lighting a "feature addition".

Another way of looking at it: A code-modernisation does not add any value for end-users. It only adds potential value to maintainers.

And, if I am reading @nedbat right (correct me if I'm wrong), these are the main concerns.

Now, concerning the related points in this thread by @ssbarnea:

Typing Information (py.typed)

The added value here is aimed at people who import coverage in their tools. They get the benefit of type-checking and, even more useful: much better code-completion in editors which understand type-hints.

Some type-hints would however benefit from first modifying some data-types from plain-tuples to named-tuples (or even frozen data-classes). Dataclasses require 3.8, but have been backported to 3.7 via an additional dependency. The same goes for "protocols" which also offer very interesting typing options.

Ditching setup.py

Looking at setup.py I can see that it does indeed a fair amount. I see two main difficulties for using a pyproject.toml: The conditional compilation of the extension module, the dynamic eval of the version string, and the dynamic modification of the classifiers.

pyproject.toml is static. And moving away from setup.py would need a major rethink of the packaging process. This is not going to be trivial, and considering that this would only impact the packaging process (i.e. no impact on either end-users or code-maintainers) I see this as very low priority.

The value in this change would be a decoupling of the build-system from the developer-environment: Via the pypa build package, the build-process will spin up an isolated environment to execute the build and packaging. This is pretty nice and also makes automation tools easier as all you really need to do is run pyproject-build or python3 -m build without needing to know what packaging tool is used by this project.

But the dynamic nature of this project's setup.py makes this more difficult. Personally I only have experience with poetry and setuptools and I am not a fan of poetry and am ditching it again on every project I used it.

optparse

The main added value here would (probably?) be maintainability?

I haven't used optparse in ages and only use argparse so I don't clearly remember the advantages. But I remember that switching to argparse was a nice experience in my projects.

This is a risky change though. We don't know how users are currently spelling their command-lines. And if not suuuuuuper careful, this change might break the current behaviour. One thing I am thinking of is the positioning of "global" arguments and "subparser" arguments. They are useful, but not fully Posix compliant (I think). For example, a parse which has a global argument for "verbosity" and a subparser with the name "foo" requires the "verbosity" flag right after the command, not the subcommand:

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--verbose", action="store_true", default=False)
sub = parser.add_subparsers()
sub_parser = sub.add_parser("foo")
sub_parser.add_argument("filename")
print(parser.parse_args())

# Executions: 
# my-command --verbose foo  # correct
# my-command foo --verbose   # incorrect

As a final note: My experience with coverage that cause me to open this ticked was because I wanted to experiment a bit with the output formatting on the console (and maybe HTML too) but I found it unwieldy to work on that. Mainly because the code generating the output was tightly coupled with other parts of the code (It's been a while and don't remember the details). This led me to the thought of type-hints because they make it a hell of a lot easier to refactor code because you get immediate feedback if you mess up somewhere.

nedbat / coveragepy