hdmf-dev / hdmf

The Hierarchical Data Modeling Framework
http://hdmf.readthedocs.io
Other
46 stars 24 forks source link

[Feature]: Replace docval with better strict type and shape validation system with type hints #1129

Open rly opened 3 weeks ago

rly commented 3 weeks ago

What would you like to see added to HDMF?

I realized we have discussed this many times in the past year but we do not have an issue for it yet. This came up again today during the NWB Data Conversion Workshop.

PyNWB/HDMF uses docval which was developed before type hints were officially supported by Python and before Pydantic and these other strict type-checkers were popular. docval is now incompatible with systems that display type hints, like hovering over variables in Jupyter and auto-complete in IDEs. And it is cumbersome for new developers or anyone browsing the source code to learn. To improve usability and maintainability of the NWB and HDMF APIs, I suggest we replace docval with a more modern strict type-checking system and documentation system. This will be tedious but worth it in the end.

docval is used for documentation, type checking, and shape checking, and it is used in code that inspects other classes like the class generator and neuroconv's code that gets a json schema from a classes' constructor docval. The validator may also use docval args. We have hooks that allow you to create docval aliases like "array_data" that can be dynamically updated, e.g., in HDMF Zarr. We need to be careful when replacing docval to ensure we do not alter or lose significant functionality.

What solution would you like?

Replace docval with another system like pydantic in strict mode, beartype, or numpydantic. Need to research options. Pydantic is widely used and plays nicely with JSON schema, which will be useful in potential long-term integration with LinkML. beartype appears to be quite fast. I think neither plays nicely with numpy arrays, so we may need to use something like numpydantic

Do you have any interest in helping implement the feature?

Yes.

h-mayorquin commented 1 week ago

As a related background information. This has been an ongoing concern in numpy for a while: https://github.com/numpy/numpy/issues/16544

mavaylon1 commented 1 week ago

@rly is this something we can start in August?

rly commented 6 days ago

Yes, as discussed in person, let's target August/September to start working on this together.