ome / ome-ngff-validator

Web page for validating OME-NGFF files
https://ome.github.io/ome-ngff-validator
BSD 2-Clause "Simplified" License
5 stars 4 forks source link

Computed array/chunk sizes are not correct for numpy.dtype types (uint8, int16, etc.) #38

Closed psobolewskiPhD closed 1 month ago

psobolewskiPhD commented 1 month ago

As far as I can tell, when using the validator for a zarr that has numpy.dtype, e.g. uint16, the array size and chunk sizes do not appear to be correctly computed. See, e.g.: https://deploy-preview-36--ome-ngff-validator.netlify.app/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/idr0051/180712_H2B_22ss_Courtney1_20180712-163837_p00_c00_preview.zarr/0/

image

I'm pretty sure this should be:

79*1*201*333*333 = 1,760,806,431 pixels

and uint16 means 2 bytes per pixel, so:

1,760,806,431 pixels * 2 byte/pixel = 3,521,612,862 bytes ~ 3.5 Gb

So in this case, the real size is ~2x the reported size.

Here's another case: https://deploy-preview-36--ome-ngff-validator.netlify.app/?source=https://storage.googleapis.com/jax-public-ngff-2024/KOMP/adult_lacZ/ndp/A1cf/24325_K24230_FColon.zarr/0/

image
1*3*1*46080*113280  = 15,659,827,200 px

This time we have uint8 so 1 byte per px.

15,659,827,200 px * 1 byte/px = 15,659,827,200 bytes ~ 15 Gb

So the reported is ~8x larger.

I suspect what is happening is that the 8 in uint8 is being used, rather than treating it as 1 byte/px.

Maybe the validator isn't handling cases of numpy & python dtypes vs array-protocol dtypes ?(https://numpy.org/doc/stable/reference/arrays.interface.html#arrays-interface)

will-moore commented 1 month ago

Thanks for the report. I'll have a look into it...

psobolewskiPhD commented 1 month ago

Because I can't just let stuff be, the issue is here: https://github.com/ome/ome-ngff-validator/blob/a3ea7514c1aa99d510d034ccb61282cf758f929e/src/JsonValidator/MultiscaleArrays/ZarrArray/index.svelte#L20-L24

uint16 has 1 and uint8 has 8 so the calculation is wrong. I don't know enough javascript to fix this -- is a dict-like structure possible to translate np.dtypes to byte values, else use the array-protocol ones you have?

will-moore commented 1 month ago

Should be fixed now (in https://github.com/ome/ome-ngff-validator/pull/36/commits/2853654ce9c93803fe9e1ecae3f7620f23dc6f60)