hdmf-dev / hdmf

The Hierarchical Data Modeling Framework
http://hdmf.readthedocs.io
Other
46 stars 26 forks source link

[Feature]: checking data types of arrays on construction #797

Open bendichter opened 1 year ago

bendichter commented 1 year ago

What would you like to see added to HDMF?

I would be great if we could check the data types of arrays during construction. Right now our solution is to cast during write and throw a warning, but this is problematic because it could lead to incorrect conversions. Consider this example:


from scipy import misc

from pynwb.image import Images, IndexSeries, GrayscaleImage, RGBImage
from pynwb.base import ImageReferences
from nwbinspector.tools import make_minimal_nwbfile
from pynwb import NWBHDF5IO

rgb_face = RGBImage(
    name="rgb_face",
    data=misc.face(),
    resolution=70.0,
    description="RGB version of a raccoon.",
)

images = Images(
    name="raccoons",
    images=[rgb_face],
    description="A collection of raccoons.",
    order_of_images=ImageReferences("order_of_images", [rgb_face]),
)

idx_series = IndexSeries(
    name="stimuli",
    data=[0.3, 1, 0, 1],
    indexed_images=images,
    unit="N/A",
    timestamps=[.1, .2, .3, .4],
)

nwbfile = make_minimal_nwbfile()
nwbfile.add_acquisition(images)
nwbfile.add_acquisition(idx_series)

with NWBHDF5IO("test34.nwb", "w") as io:
    io.write(nwbfile)

Here, data contains 0.3, which is wrong since it should only take integers (uint32, to be precise). This throws a warning:

/Users/bendichter/opt/miniconda3/lib/python3.9/site-packages/hdmf/build/objectmapper.py:260: DtypeConversionWarning: Spec 'IndexSeries/data': Value with data type float64 is being converted to data type uint64 (min specification: uint32).

I'm all for casting values if it makes sense, but I don't think it makes sense to cast a float as an int if the value changes. The other problem is the user doesn't see this error until write occurs. I'd much rather the data type be clearly communicated in the function signature and the documentation and that it be caught on construction.

Is your feature request related to a problem?

No response

What solution would you like?

see above

Do you have any interest in helping implement the feature?

No.

Code of Conduct

oruebel commented 1 year ago

Thanks for helpful description. If I understand it correctly, this issue touches on two separate questions:

  1. On write: When should we cast and warn vs. when should we raise an error instead?
  2. On construction: How can we improve checking of types to warn already during construction?

Just a few thoughts for a few possible ways to address this:

Re 1.:

Re 2.:

I'm sure there are other options. Any thoughts?

bendichter commented 1 year ago

Thanks for breaking this down, @oruebel. This is exactly right. As per our conversation, let's focus first on catching non-castable data in the constructor.

oruebel commented 1 year ago

Since type and shape are already specified via separate arguments in docval, I'm wondering whether the simplest approach may be to add a dtype key in docval to specify the data type for arrays.

bendichter commented 1 year ago

That could work!

oruebel commented 1 year ago

@mavaylon1 can you help with this?