stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
103 stars 25 forks source link

Using dataclasses in PSyclone #2352

Open arporter opened 9 months ago

arporter commented 9 months ago

@sergisiso writes:

Since we are starting using dataclasses and python type hints, I think we should decide how we use them in the whole codebase going forward:

  1. Regarding type hints. If we annotate method signatures and dataclasses with them, do they appear in the generated documentation?, and if they do, should we start removing them from the docstring (leaving just the descriptions there)?

  2. Regarding dataclasses, in the python ecosystem there are some competing libraries (e.g. attrs, pydantic), see https://stefan.sofa-rockers.org/2020/05/29/attrs-dataclasses-pydantic/#:~:text=Data%20classes%20use%20the%20optional,prohibit%20this%20for%20good%20reason. We should choose the one more convenient for us, some allow to annotate which class fields to use in equality, or frozen classes to avoid state mutability, which would allow us to remove some code and manual checks. Also note that any of them do automatic run-time type checking. We need to decide if we use validators or post_init hooks to keep our current functionality.

arporter commented 9 months ago

@rupertford writes:

I don't know about the documentation either so can't comment. As for 2) using 3.7+ dataclasses seems to be reasonable rather than things that are not in the standard library.

arporter commented 9 months ago

Attrs (https://www.attrs.org/en/stable/) has been around a while now and is presumably pretty stable. It offers the significant advantage of allowing the user to provide validation methods.

hiker commented 9 months ago

I don't know anything about dataclasses etc, so can't contribute much. I couldn't even find where we are using it :(

arporter commented 9 months ago

I don't know anything about dataclasses etc, so can't contribute much. I couldn't even find where we are using it :(

We're not yet :-) @oakleybrunt has PR #2321 which currently uses a dataclass but we need to decide if that's the option we like the best.

arporter commented 9 months ago

The documentation generated by Sphinx AutoAPI for Oakley's new class looks like: image So the type information is included and we could therefore agree to drop it from the class docstring.

arporter commented 9 months ago

Doxygen produces the following: image It would be good if the documentation of the parameters appeared in the "Static Public Attributes" section really.

arporter commented 9 months ago

You can document the parameters but it has to be done as comments beginning with ##: image Such comments are a Doxygen thing though and are not picked up by AutoAPI.

arporter commented 9 months ago

If I switch to using attrs define decorator: image which is pretty much the same as before.

arporter commented 9 months ago

Unfortunately, the Doxygen version is less good: image

sergisiso commented 9 months ago

Regarding the dataclass vs attrs vs pydantic discussion, I think any of the 3 are better than namedtupes due to them generating documentation but dataclass can not replace our typical usage of classes because it does not check type values at runtime. So:

@dataclass
class LFRicArgStencil:
   name: str

obj = LFRicArgStencil()
obj.name = 3

is valid. The best we can do is instead use: @dataclass(frozen=True) So obj.name = 3 will not be allowed, but also not obj.name = "another_name"

However, if we want to substitute our classes, attrs allows:

@attr.define
class LFRicArgStencil:
    name: str = attr.field(validate=_validate_name, on_setattr=_validate_name)

which in simple cases like this we don't need to implement the validation function as attrs comes with attr.validators.instance_of(str)

In general is a good idea to not allow state to be modified (but we do it often). Maybe for an initial step it is enough if we use dataclasses always with frozen=True for the cases that we forbid to mutate the state, and we can decide to use another library than simplifies our validation later on?

arporter commented 9 months ago

I guess if all we're doing is replacing namedtuple then having frozen=True is no problem? I like the fact that pydantic provides a decorator that will type-check argument calls for you but that's heavier weight. (I'd always assumed that type hints got used for that kind of thing automatically otherwise what's the point?)

arporter commented 9 months ago

BTW, the doc produced for the pydantic equivalent is not so nice either: image

sergisiso commented 9 months ago

I'd always assumed that type hints got used for that kind of thing automatically otherwise what's the point?

Unfortunately python is too dynamic to avoid it to trick it in some way, so type hint can not be used for optimizations or runtime guarantees.

They are only for static tools, like documentation generators, linters, ...

sergisiso commented 9 months ago

The documentation generated by Sphinx AutoAPI for Oakley's new class looks like

I was hoping we could also remove the duplicated type inside the docstring, something like in: https://sphinx-autoapi.readthedocs.io/en/latest/how_to.html#how-to-include-type-annotations-as-types-in-rendered-docstrings

arporter commented 9 months ago

The documentation generated by Sphinx AutoAPI for Oakley's new class looks like

I was hoping we could also remove the duplicated type inside the docstring, something like in: https://sphinx-autoapi.readthedocs.io/en/latest/how_to.html#how-to-include-type-annotations-as-types-in-rendered-docstrings

That looks good. Ideally we'd do away with building the Doxygen version and just stick with Sphinx AutoAPI but I still prefer the output of Doxygen :-)

sergisiso commented 9 months ago

But the doxygen still looks good for method signatures with type hints? Was your "##" comment specifically about dataclass fields?

arporter commented 9 months ago

But the doxygen still looks good for method signatures with type hints? Was your "##" comment specifically about dataclass fields?

Yes, it was just for class attributes.

sergisiso commented 9 months ago

Didn't we already had this problem before? I think I have seen some "#:" comments that I thought it was to add documentation to class variables. Did this only work for one documentation?

arporter commented 9 months ago

Yes, sorry, I didn't mean to imply it was a new problem. I was just hoping that moving to a dataclass might help with it.

arporter commented 9 months ago

For now, we've agreed to go with the 'standard' dataclass. We have also agreed that such classes no longer need separate documentation of the :type: of parameters because the documentation tools get this from the type hints. I've updated the interface-markup rules (https://github.com/stfc/PSyclone/wiki/InterfaceMarkup) accordingly.

hiker commented 9 months ago

Could you update the example to use the required type-hint and dataclass usage? I've just tried using type-hints, and I noticed that the created documentation looks like this: image

(i.e. the type info is not in the parameters list anymore). Is that expected, and is this what we should be doing?

I also tried specifying a PSyclone class as return type:

    def get_parse_tree(self) -> :py:class:`fparser.two.Fortran2003.Program`

... in various combinations (with adding ":" in different spots). What should I do here?

Also, it would be great to add the 'expected' use of dataclass in this example.

I am happy to update the wiki, if someone confirms that what I did was what is expected :)

arporter commented 9 months ago

(i.e. the type info is not in the parameters list anymore). Is that expected, and is this what we should be doing?

Yes, that's expected and correct (since it now appears in the routine signature).

Currently, we're only anticipating using dataclass in place of namedtuple.

The ":py:class:" stuff is Sphinx markup, not Python. I think this will help: https://stackoverflow.com/questions/44664040/type-hints-with-user-defined-classes

So probably just -> Fortran2003.Program if Fortran2003 has been imported.