Type annotation in pyxem

viljarjf commented 1 year ago

Is your feature request related to a problem? Please describe. I am currently learning how to use the pyxem suite for template matching, and I noticed the lack of type annotations/ hinting. I am a big fan of getting useful autocompletion and suggestions for the objects returned by different functions, and as such would like to add type hinting to the pyxem suite. The docstrings in general are great, and are very specific and clear regarding type input and output, so adopting type hinting in the declarations as well should not entail much additional work for new functions/classes ect.

The main advantage of adopting type hinting is the IDE recognizing the types of objects returned by functions ect. I mostly agree with the discussion in #563 regarding inputs, as the docstrings clearly state the expected input types. One could argue that they also clearly state the output type, but referring back to the documentation that created the object is more tedious in my opinion.

I saw it was discussed in #563 previously, and in https://github.com/pyxem/orix/issues/68, but I thought I might bring it up again as those are now three years old. From what I understand, the main reason for not using it was the effort involved. Both orix and kikuchipy have type hinting already, so it is clearly not foreign to pyxem in general. Regarding the points brought up in https://github.com/pyxem/orix/issues/68#issuecomment-645283684 about other libraries adopting type hinting: neither scipy, or scikit-learn use it yet, but numpy uses it some places. This might just be for backwards compatibility, I haven't checked. hyperspy does not use it either, but I believe it might not work properly anyway since a lot of import resolution happens dynamically at import-time, from what I understand. My linter (pylint, in VS Code) does not recognize this.

To be clear: I am not suggesting additional type checking software like mypy, only type annotations in function declarations. As an example, for pyxem/utils/indentation_utils.py:

# current
def correlate_library_to_pattern(
    image,
    simulations,
    frac_keep=1.0,
    n_keep=None,
    delta_r=1.0,
    delta_theta=1.0,
    max_r=None,
    intensity_transform_function=None,
    find_direct_beam=False,
    direct_beam_position=None,
    normalize_image=True,
    normalize_templates=True,
):
    ...

# new
def correlate_library_to_pattern(
    image: np.ndarray,
    simulations: list[DiffractionSimulation],
    frac_keep: float = 1.0,
    n_keep: int = None,
    delta_r: float = 1.0,
    delta_theta: float = 1.0,
    max_r: float = None,
    intensity_transform_function: Callable[[np.ndarray], np.ndarray] = None,
    find_direct_beam: bool = False,
    direct_beam_position: tuple[float, float] = None,
    normalize_image: bool = True,
    normalize_templates: bool = True,
) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
   ...

This is just an example, using python >=3.9 syntax. I believe pyxem is supported down to 3.7, so appropriate syntax will be used (typing.List instead of the builtin list ect.) unless I am "allowed" to use from __future__ import annotations as described in PEP 585. I personally favor the latter.

As brought up in https://github.com/pyxem/orix/issues/68#issuecomment-645283684, and as can be seen in the above code example, type hinting can introduce some visual "clutter". This is especially true here, with GitHub's syntax highlighting. With e.g. VS Code, it looks better in my opinion: bilde

Describe the solution you'd like If I go through pyxem and its related packages to type hint, will such an addition be welcome? This is a large and boring undertaking, so I wanted to check with the community first. I suggest that I start with diffsims, as it is smaller than pyxem.

CSSFrancis commented 1 year ago

@viljarjf This is a good discussion to bring up, obviously things change with time so rehashing things is always good! Personally, I haven't really gotten into the habit of adding in type checking to arguments for pyxem. My thought process is that the information is already in the doc strings and it seems like a redundancy that at least for me wasn't worth the extra effort. That being said I haven't looked into it terribly much to see what the benefits would be so I could be very easily persuaded (I think most of them are related to better linting and autocomplete? correct?)

I think that most of the python community is going towards adding type checking. So as much as I don't love how it makes the function signature look it would be a good thing to do and I don't see a good reason not to do it!.

@hakonanes or @ericpre You usually have good insight into these kind of things, do either of you have strong opinions either way?

Just a note that doing this by hand might be tedious. There are projects like pyannotate which should work fairly well to automate the process.

hakonanes commented 1 year ago

@viljarjf, a good suggestion!

I've come to like type hints. During development, they enable my PyCharm editor to inform me if I'm passing an unexpected value to a function. Often, I originally intended for the unexpected value to be allowed, in which case I update the type hints. I don't think my editor is as good at inferring this information from numpydoc docstrings.

The most important use of type hints is to automatically generate the technical reference (API, link). Sphinx can automatically parse these via a plugin. So we don't loose anything by dropping the expected types from the parameter name line in the docstring. This has to be done as type hints must be listed in one place only (DRY).

I believe pyxem is supported down to 3.7, so appropriate syntax will be used (typing.List instead of the builtin list ect.) unless I am "allowed" to use from future import annotations as described in PEP 585.

This import should be fine. For those on the fence about adding type hints, a thing to keep in mind when considering the ugliness of Union[X, Y] is that when 3.10 is the minimal supported version of pyxem, we can write X | Y instead.

Just a note that doing this by hand might be tedious.

Perhaps. Adding type hints is a very good way to get to know the code base, though. If this is the goal, I'd recommend not automating it.

Also, I'd be fine with a first PR adding type hints to the smallest module. Type hints can then be added module by module.

I'll leave a decision up to active maintainers, which I guess is @CSSFrancis and @magnunor at the moment (?).

pyxem / pyxem

Type annotation in pyxem #951