Fix reading dtype from zarr dataset

alejoe91 commented 1 year ago

Hi guys,

I'm experimenting now with writing to Zarr using a custom DataChunkIterator from neuroconv bassed to the ZarrDataIO.

Everything works fine, but upon reading the file, I get that the zarr_type is not found in the attirbutes. However, this can be inferred from the dtype of the zarr.Dataset. Not sure this is the most elegant solution and I didn't have time to dig in and find a better one, but it works in my case :)

oruebel commented 1 year ago

@alejoe91 it sounds like the zarr_dtype attribute is not being set when writing with DataChunkIterator. From a quick look I think the following code in __setup_chunked_dataset__

https://github.com/hdmf-dev/hdmf-zarr/blob/e4e9543e53efaefc31b74c4fd8557353cb0b9121/src/hdmf_zarr/backend.py#L671-L675

is likely the problem and should be updated to set the zarr_dtype attribute as follows:

        type_str = self.__serial_dtype__(io_settings['dtype'])
        try:
            dset = parent.create_dataset(name, **io_settings)
            dset.attrs['zarr_dtype'] = type_str
        except Exception as exc:
            raise Exception("Could not create dataset %s in %s" % (name, parent.name)) from exc
        return dset

The fix you propose helps work around the issue if an invalid file is encountered but doesn't fix the issue of creating bad files.

alejoe91 commented 1 year ago

Yeah I figured there was probably a better solution! Should I update the PR with the fix you suggested @oruebel ?

oruebel commented 1 year ago

Should I update the PR with the fix you suggested

Yes please. I think we can keep your fix as well as since it fixes potential issues on read, but it would be best to make sure we write the file correctly.

alejoe91 commented 1 year ago

Should I update the PR with the fix you suggested

Yes please. I think we can keep your fix as well as since it fixes potential issues on read, but it would be best to make sure we write the file correctly.

Should be done ;)

alejoe91 commented 1 year ago

@oruebel

fixed flake8. I don't think failing tests are due to this change

oruebel commented 1 year ago

I don't think failing tests are due to this change

I'll take a look at fixing tests tomorrow. One issue is that codecov has been pulled from PyPi and it looks like there are some other issues in the gallery tests. I'll work on that to make sure we can get all tests passing.

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 50.00% and project coverage change: -0.08 :warning:

Comparison is base (708e1a7) 82.02% compared to head (35202aa) 81.94%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## dev #72 +/- ## ========================================== - Coverage 82.02% 81.94% -0.08% ========================================== Files 11 11 Lines 2642 2647 +5 ========================================== + Hits 2167 2169 +2 - Misses 475 478 +3 ``` | [Impacted Files](https://app.codecov.io/gh/hdmf-dev/hdmf-zarr/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hdmf-dev) | Coverage Δ | | |---|---|---| | [src/hdmf\_zarr/backend.py](https://app.codecov.io/gh/hdmf-dev/hdmf-zarr/pull/72?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=hdmf-dev#diff-c3JjL2hkbWZfemFyci9iYWNrZW5kLnB5) | `87.75% <50.00%> (-0.36%)` | :arrow_down: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

hdmf-dev / hdmf-zarr

Fix reading dtype from zarr dataset #72

Codecov Report