hdmf-dev / hdmf

The Hierarchical Data Modeling Framework
http://hdmf.readthedocs.io
Other
46 stars 26 forks source link

[Bug]: Extensions are not robust to order #1086

Open rly opened 5 months ago

rly commented 5 months ago

What happened?

If an extension defines data type A that includes or links to data type B and data type A is defined before data type B, then resolution fails. This issue can be worked around by re-ordering the data type definitions so that B is defined before A in the YAML.

While we could make a rule that says a data type must be defined before it is referenced in another data type, that is not very user-friendly behavior. On read of a YAML file, I think we could scan for includes and links and resolve the types in a smart order that avoids these issues if possible.

Circular references would still be an issue, but that is rare.

This ordering issue can also happen across multiple YAML files, but it is very rare to have multiple YAML files, so I think improving that is extra-low priority.

Steps to Reproduce

See https://github.com/NeurodataWithoutBorders/pynwb/issues/1873

Traceback

No response

Operating System

macOS

Python Executable

Conda

Python Version

3.12

Package Versions

No response

rly commented 5 months ago

@mavaylon1 do you think you could take this on?

oruebel commented 5 months ago

Just an observation. I think solving this in the most general case is going to be very tricky. I would hence suggest adding a rule/best practice in the extension docs on nwb-overview and also in the schema language docs to document that types should be defined before use. In compiled languages, like C++, types have to be at least declared before use (even if the definition comes later). If addressing the common case is easy, then let's do that, but I think this issue is likely a rabbit whole, so we probably want to limit the kind of cases we address vs. making it rule. Maybe better reporting of errors to indicate when a type is being used before it has been defined could go a long way to help.

mavaylon1 commented 5 months ago

@rly I can, but I'll take a look in two-three weeks due to current PRs and the roadshow. Is that timeline okay with you?

rly commented 5 months ago

@mavaylon1 Yes, this is low-priority and not that important. If you could in a few weeks, please spend 10 min looking into how hard it would be to do this. I think it might be useful for you to learn how extension specs are resolved. But I'd say if you don't think you can do it relatively quickly (<2 hours), then let's table it and instead just update the error to say something like "Please ensure that data type X is defined in the schema YAML file before being used in data type Y" and update the docs.

mavaylon1 commented 5 months ago

@rly I will take a look after dev days

mavaylon1 commented 1 month ago

@rly This got lost in backlog of items. Let's sync on this in next few weeks.