yt-project / unyt

Handle, manipulate, and convert data with units in Python
https://unyt.readthedocs.io
BSD 3-Clause "New" or "Revised" License
360 stars 46 forks source link

Type checking unyt ? #296

Open neutrinoceros opened 1 year ago

neutrinoceros commented 1 year ago

This is mostly a question to @ngoldbaum and @jzuhone: How would you guys feel about progressively adding type hints and a type checking stage to CI ? To be clear I'm thinking about doing it at least partially myself, because numpy is almost 100% "typed" now and IMO it would make sense to follow their lead. This is a long term goal as this could be quite an undertaking (though maybe not !), so I wanted to get your sentiment on it first.

ngoldbaum commented 1 year ago

I’m pretty ambivalent about it, I don’t think it’s worth putting a ton of effort in for marginal user benefit.

neutrinoceros commented 1 year ago

The thing is that I really don't have a clear idea how much work it would take. Hypothetically, were the change actually not that big, would you consider it ?

ngoldbaum commented 1 year ago

I don’t see how you could do it without a lot of code churn. If you’d like to try to convince me, go ahead, but you will need to convince me before I merge PRs adding type annotations.

neutrinoceros commented 1 year ago

I think I need to convince myself first 😅 if I do, I may submit a proof of concept PR, but for now NEP 18 is much higher priority to me.

neutrinoceros commented 1 year ago

I'd like to try doing this in progressive steps. That is, while I don't intend to add type hints everywhere in the code base in one go, I do see value in sprinkling them as I'm working on various aspects of the code. In order enable this, we first need to resolve (or explicitly ignore) all errors that mypy unyt can currently raise, and add type checking to CI. I have convinced myself this first step is a reasonable amount of work so I will open a PR to do just that. After that, we can add type hints wherever and whenever we want if it feels right.

ngoldbaum commented 1 year ago

Like I said back in October, you’re going to need to convince me it’s worthwhile to add type annotations before I merge PRs adding them.

neutrinoceros commented 1 year ago

Well an argument could be made that #355 is valuable in itself; because mypy in general will check consistency of one's code and its dependencies, by running type-checking in unyt's CI we can at a minimum guarantee that downstream code will not discover inconsistencies they cannot control within unyt. We've been working on very progressively adding type information in yt so I can testify that there's at least one code depending on unyt that does this.

Beyond that, I can think of a couple reasons why type annotations are useful to have for unyt's own good:

1) modern IDEs understand type hints and integrate them via hovering and auto-completion, which improves coding experience for anyone who relies on the annotated library. 2) it's validated documentation. Unlike types in docstrings (which we do use), that are only validated manually and always at risk of falling out of sync when code changes. 3) type-checking doesn't just verify that type hints are valid and consistent, it can also detect actual bugs. For instance, taking a very simplified version of unyt_quantity.from_string

import re
_UNIT_PATTERN = r"([α-ωΑ-Ωa-zA-Z]+(\*\*([+/-]?[0-9]+)|[*/])?)+"
_UNIT_REGEXP = re.compile(_UNIT_PATTERN)

def unit_from_string(s:str) -> str:
    return re.match(_UNIT_REGEXP, s.strip()).group()

mypy will produce

t.py:7: error: Item "None" of "Optional[Match[str]]" has no attribute "group"  [union-attr]

because re.match may return None, in which case the code would normally raise AttributeError at runtime. In fact unyt_quantity.from_string was originally written in a typed-check code of mine, which helped me make sure the error messages were clear and intended.

Another example of classic gotcha that mypy can detect even with minimal amount of annotations

def get_letter(n:int) -> str:
    if n == 1:
        return "a"
    elif n == 2:
        return "b"
    # ... you get the idea
    elif n == 26:
       return "z"

mypy will detect that my function can return None instead of a string if n is any int that doesn't match any branch.

t.py:1: error: Missing return statement  [return]

4) I often find myself wondering what types are things supposed to be. The process may take an inordinate amount of time in some cases. Adding type hints after I figure it out saves me time next time I work on the same region of the code.

neutrinoceros commented 1 year ago

I should add: no matter the amount of type hints we hints we want to have (even 0), adding type-checking to the supply chain can help with discovering subtle bugs. The idea isn't to make a bazillion of PRs to go from 0 to 100% typed code, but rather to enable this aid (mypy) so we (I ?) can add type hints whenever working on something else. Of course we're free to add type hints without using a checker, but for anything more than trivial it's a risk of introducing errors.

ngoldbaum commented 1 year ago

My main objection is adding any types at all (especially in a piecemeal way) will increase the cognitive overhead of reading the code. I don’t personally see the benefit for my personal usages with unyt, and I’d prefer not to have to learn the details of python typing just to support reviewing your PRs that add something I don’t want to deal with.

For now I’m not going to merge PRs that add type hints to unyt. If that means I don’t get more PRs from you I guess that’s too bad.

It leaves a bad taste in my mouth that you started doing work to add this after my last comment was that you’d need to convince me that it’s worth doing.

neutrinoceros commented 1 year ago

It leaves a bad taste in my mouth that you started doing work to add this after my last comment was that you’d need to convince me that it’s worth doing.

I'm sorry you feel this way, it wasn't my intention that you felt unheard. Rather, I intended #355 as a concrete demonstration that kicking this off wasn't as big of a deal as one might have thought (it wasn't clear even to me before I just did it).

For now I’m not going to merge PRs that add type hints to unyt. If that means I don’t get more PRs from you I guess that’s too bad.

I don't think this will completely ruin my motivation to contribute, and I hope it didn't ruin your trust in me either.

My main objection is adding any types at all (especially in a piecemeal way) will increase the cognitive overhead of reading the code.

all I can say is that in my experience, given enough time, the opposite becomes true, but I understand this isn't something I can really convince you of before jumping in.

I don’t personally see the benefit for my personal usages with unyt, and I’d prefer not to have to learn the details of python typing just to support reviewing your PRs that add something I don’t want to deal with.

You seem to have stronger opinions on this matter than I previously thought, which may explain this mishap. Again I am sorry for how I handled things, I genuinely didn't anticipate you'd receive it poorly.

basnijholt commented 5 months ago

For what it is worth, I am evaluating different libraries that deal with units. Just the fact that this code is not properly typed is a reason to skip unyt IMO.

About

My main objection is adding any types at all (especially in a piecemeal way) will increase the cognitive overhead of reading the code.

I definitely agree with @neutrinoceros here, the code is much easier to understand once you have worked with type hints a bit.

In my experience, being a full-time Python developer for almost a decade, adding type hints and using mypy has uncovered bugs in basically every project I worked on. The benefit might not be immediately obvious because it might identify only a handful of bugs, however, the major advantage is identifying bugs while writing new code. Before you execute the code it already helps you a lot.

Additionally, having type-hints is also great for the users of your package because it allows editors to make meaningful suggestions.

Hope you'll reconsider at some point.

berceanu commented 2 months ago

@basnijholt which unit library did you end up choosing in the end, asking out of curiosity

basnijholt commented 2 months ago

astropy.units because it seems to have the best supports with numerical packages. The downside is that is comes with a lot of code that I won't use.

JonathanRayner commented 1 month ago

@ngoldbaum I'll just add that type hinting/checking is considered standard for all serious new projects, especially in industry. If you want people to use this library (and the benefits of contribution/extension, etc.), an obtuse attitude towards keeping up with standards is not a good idea (especially when people are volunteering to help with PR's to update).

I find it bizarre that you are obviously advertising this library for others to use (writing a paper on it, documentation, lots of stylistic language about a broader benefit to science), yet veto reasonable requests with no other reason than you don't like how it will affect your personal usage. Which is it? Do you want to advance science and contribute to the community and in return receive the benefits of prestige and community contribution, or do you just want to be emperor of your own toy project? Either is of course fine, no one is forced to use your library. But you should not be disingenuous about what the purpose of this library is if it is just for your usage.

Maybe after writing some more Rust you will be less adversarial to type checking in python? Perhaps your views have changed?

jzuhone commented 1 month ago

@JonathanRayner FWIW, Nathan is no longer maintaining this project--that is up to me and @neutrinoceros.

I'm personally open to exploring type checking--I am unfortunately very swamped at the moment with other things.

neutrinoceros commented 1 month ago

I'll add that while I don't have personal motivation to pursue this myself now, I would happily review PRs if anyone wants to give it a spin.