jnikula / hawkmoth

Hawkmoth - Sphinx Autodoc for C
https://jnikula.github.io/hawkmoth/
BSD 2-Clause "Simplified" License
74 stars 12 forks source link

Type hints #231

Closed jnikula closed 2 months ago

jnikula commented 10 months ago

Add the facility to use and check type hints. Add minimal type hints for now to make mypy pass, this is just the infrastructure. Also some minor static analysis make target shuffling.

I'm not sure if we should go all in on type hints, but perhaps it would be helpful to make it possible to add some useful type hints, as a start?

jnikula commented 10 months ago

Argh. I thought I'd force Python 3.6 for type checks on github because a) that's the minimum Hawkmoth requires, and b) there's been so many changes with type hints that without it we'd probably use something only available on newer Python.

However, ubuntu-latest doesn't even have Python 3.6 anymore. So I tried to use older ubuntu for the workflow.

But then the venv script fails because it uses pip --config-settings option which apparently isn't available on older Python. Or something.

Should we consider bumping the minimum Python version? Where to?

EDIT: I just removed the Python 3.6 stuff altogether for now.

EDIT: See #232 to bump Python dependency to 3.8

jnikula commented 10 months ago

One thing about type hints that I dislike (and one of the reasons I haven't bothered so far) is that they can get quite ugly and unwieldy. And you end up spending too much time wondering how you layout them in source code.

So I'm wondering if starting to add type hints also means enforcing a tool formatted coding style, e.g. https://github.com/psf/black.

BrunoMSantos commented 10 months ago

So, 1st impressions without reading the patchset... I really dislike the concept of type hinting in Python as I understand it:

Also, though not so important I think:

Am I missing some big advantage of doing this? Honest question :sweat_smile:

So I'm wondering if starting to add type hints also means enforcing a tool formatted coding style, e.g. https://github.com/psf/black.

Wouldn't ask for it either, but I can definitely see the benefits there. I'd be ok with this regardless of the hinting.

jnikula commented 10 months ago

Am I missing some big advantage of doing this? Honest question 😅

To me the two main advantages are the static type checking (and it does catch actual mistakes) and the documentation to the developers. And it was the Cursor vs. DocCursor that got me thinking about this in the first place.

And the biggest downside is indeed how ugly and verbose it looks.

So I'm wondering if starting to add type hints also means enforcing a tool formatted coding style, e.g. https://github.com/psf/black.

Wouldn't ask for it either, but I can definitely see the benefits there. I'd be ok with this regardless of the hinting.

Personally I'm perhaps more leaning towards checking conformance to a style than forced formatting regardless of what the current state is. This is pretty much what we have now with flake8 (and could enable more plugins for it).

It's just that the type hinting makes nice manual formatting harder, and leads to more head scratching, and might tip the scales towards forced formatting. :shrug:

BrunoMSantos commented 10 months ago

Am I missing some big advantage of doing this? Honest question 😅

To me the two main advantages are the static type checking (and it does catch actual mistakes)

Well... Does it though? Can you think of an example where the hinting would have caught something that would otherwise be missed? :nerd_face:

I know that it can nominally catch errors, but it won't catch anything that is not caught by the tests with a high enough coverage and even then relying on a parallel (though not entirely independent of course) stream of information that is completely ignored by compliant interpreters.

Anyway, I'll concede the point if you see value in it, it's not like I'm dead set against it.

and the documentation to the developers.

Yeah, I'll accept this one, though I still think the added verbosity still makes the net result negative.

I'll review the actual patches later today or tomorrow ;)

jnikula commented 10 months ago

Disagreements over the material purpose of the PR aside, the changes themselves look fine! Last two commits could even be added regardless in fact.

Would you add widespread hinting in a follow up PR? Or do you see it happening more organically over time?

I honestly feel uncertain about merging this at all, given that you aren't enthusiastic about it. I've wondered about just posting the last two commits as cleanup, and letting the first one be for now. And it's not like I've put a lot of effort into this.

BrunoMSantos commented 10 months ago

Up to you! Besides, I'm always ready to change my mind as well ;)

One thing that puzzles me is that when the hinting proposals were still fresh, you'd find plenty of these discussions around, but eventually they were adopted and hardly anyone questions it any more. And despite me never seeing a satisfying argument for it, plenty of high profile projects and the core libraries themselves are using it. Of course I'm an outsider though...

Looking at core library examples can be very off putting by the way. E.g. see this (I won't even point at specific lines) and tell me it's not a mistake, though it is an extreme example for sure. I fail to see advantage in using Python if duck typing is 'ceremonially' thrown away, maybe the most defining of all of Python's features, only to get it back through innumerable overloads and still perform all the (unavoidable) runtime error handling anyway.

I swear I'm not just trying to be difficult :sweat_smile:

stephanlachnit commented 10 months ago

Just my 2 cents: I actually wanted to open a PR for this as well. For me, the main pain point during development was looking up method names of the DocCursor and DocString classes. I do not know their methods by heart and because they are passed around in functions without type annotation, my IDE couldn't help me either.

While I understand that full type hinting just for the sake of adding type hinting can kinda defeat the point of Python, but adding it to some crucial code paths where you work with more complex objects (i.e. classes) can help during development (at least for new contributors that do not know these complex objects by heart (yet)).

BrunoMSantos commented 10 months ago

Just my 2 cents: I actually wanted to open a PR for this as well. For me, the main pain point during development was looking up method names of the DocCursor and DocString classes. I do not know their methods by heart and because they are passed around in functions without type annotation, my IDE couldn't help me either.

Yeah, that's a fair argument! But I feel compelled to say that I too expect those things from my 'IDE' and I get them nevertheless :)

If it interests you, I'm using python-lsp-server and it can infer what's being passed around just fine without compromising on anything else. E.g. in docstring:

Screenshot from 2024-01-19 10-04-27

Screenshot from 2024-01-19 10-13-39

Of course that only works because there's this line somewhere else where pylsp can figure out what was passed along:

ds = docstring.EnumeratorDocstring(cursor=cursor, nest=nest)

And if I add a 2nd

ds = docstring.EnumeratorDocstring(cursor=42, nest=nest)

Pylsp will now presume that cursor may be either a DocCursor or an int.

Edit: relevant references: pylsp and rope which is the magic behind completions.

jnikula commented 10 months ago

Just a quick reply here. Thanks @stephanlachnit for also chiming in!

One approach we might consider is merging the tooling for type hints (essentially this PR) but keeping the actual type hints in code to ones that give most benefit without cluttering too much. Which is going to be case by case and somewhat subjective, but perhaps better than going all-in.

Interesting further reading:

BrunoMSantos commented 10 months ago

One approach we might consider is merging the tooling for type hints (essentially this PR) but keeping the actual type hints in code to ones that give most benefit without cluttering too much. Which is going to be case by case and somewhat subjective, but perhaps better than going all-in.

I do see benefit in some of the things we discussed, even if I find the net result lacking, but for those positives I'm happy with this one way or the other.

I would have a slight preference for systematic usage though and avoid arbi9trary / inconsistent addition of these things.

Interesting further reading:

Nice references! :+1:

I don't think there's much substantially new that we haven't discussed, but I did find one unexpected claim there suggesting that hinting does in fact detect things that tests cannot:

tests can often not prove invariants of your code to hold

This would still be a strong update for me if it were true. But I remain very confident the opposite is true: one can test everything, including type invariants at runtime (assert type, if type, throw, whatever). On the other hand, there are plenty of invariants that cannot be checked with hinting (e.g. check that a dict always has a certain key).

jnikula commented 10 months ago

Sorry, this stalled a bit.

I would have a slight preference for systematic usage though and avoid arbi9trary / inconsistent addition of these things.

Can you expand on systematic usage? Do you mean just a guideline of what to type hint and what not (for example, "all functions meant to be used from other modules need type hinting") or something that can be enforced with tooling?

BrunoMSantos commented 10 months ago

I would have a slight preference for systematic usage though and avoid arbi9trary / inconsistent addition of these things.

Can you expand on systematic usage? Do you mean just a guideline of what to type hint and what not (for example, "all functions meant to be used from other modules need type hinting") or something that can be enforced with tooling?

Pretty much. Doesn't need to be enforced with tooling, just a way to avoid this debate on a case by case basis and ensure that someone like me (who would otherwise never do it) knows how to contribute to maintain whatever outcome we agree on.

These are the 4 'well defined' rules I can think of:

jnikula commented 10 months ago

Pretty much. Doesn't need to be enforced with tooling, just a way to avoid this debate on a case by case basis and ensure that someone like me (who would otherwise never do it) knows how to contribute to maintain whatever outcome we agree on.

These are the 4 'well defined' rules I can think of:

* Not at all.

* User facing APIs like DocCursor (once they're passed along to events).

* Module APIs.

* Everywhere.

Thanks. What I'd like to have is "where it's useful", but that's not well defined. So I lean towards module APIs. I fear there's a slight chance it "leaks" to internal stuff as well to be able to type hint module APIs sensibly, though. Definitely not going to be "everywhere".

Some of this could be helped with better use of classes. Like, parser.parse() returns a tuple of a list of errors and a RootDocstring. And the comment for the function is just plain incorrect. I'm sure there are better and more intuitive ways to convey this information than by either type hinting or better documentation.

PS. In the mean time I split out the two non-controversial commits here to #234 and added some other cleanups on top.

jnikula commented 2 months ago

One of the things that rubbed me the wrong way was having to use from typing import List, Type. By bumping python requirement to 3.9 since 3.8 is reaching end-of-life next month we can do away with all that.

BrunoMSantos commented 2 months ago

Hope you're keeping track for the release notes!

Of course you are, I'm just blind :upside_down_face: