gzuidhof / zarr.js

Javascript implementation of Zarr
https://guido.io/zarr.js
Apache License 2.0
132 stars 23 forks source link

Adds reading support for F-order datasets #124

Closed normanrz closed 2 years ago

normanrz commented 2 years ago

This PR adds support for reading datasets that happen to be stored in F-order. Chunks are converted to C-order in the decoding phase. The implementation is based on dynamic code generation in order to have high-performance conversions for n-dimensional datasets.

Support for writing F-order datasets could be added by converting chunks back from C-order to F-order.

Fixes #118

Are there a formatting guidelines (e.g. prettier configuration) for this project that I should adhere to?

gzuidhof commented 2 years ago

Thank you for this PR! It looks great to me, we don't have any formatting guidelines other than what eslint checks (npm run lint) iirc.

Dynamically generating the code for arbitrary dimensionality is clever, but also gives me some worry: many build pipelines and CSP (content security policies) will reject dynamically generated code (eval, new Function, import).

Now I am not sure what an alternative would look like here, one could use a slower generic version (maybe the performance difference is negligible), or even include the generated code for up to N dimensions, falling back to the slower version for higher dimensionality. Alternatively F-order support could be injected by users on-demand.

I don't know if this is big enough of a potential issue, consumers of this library can disable the checks in their build pipelines but those with stricter CSPs will probably never want to enable eval. What do you think @manzt?

normanrz commented 2 years ago

I think you raised an important concern. I didn't think about CSP when implenenting the code generation. In the new commit, I implemented specialized versions (for 2D, 3D and 4D) and a generic version. This is quite verbose and repetitive.

I created a micro benchmark comparing all implementations: https://gist.github.com/normanrz/235a5d92fe92112067d1a9494a915e8f (download and open in browser as local file)

Screenshot 2022-07-14 at 22 59 58

The results show that the specialized versions are 5-9x faster. In practice, it may not matter when compared to download times.

Alternatively F-order support could be injected by users on-demand.

How would that work?

gzuidhof commented 2 years ago

Thank you for the detailed response and updated PR, interesting to see the benchmark results. It's definitely worth it to have the specialized versions for common dimensionalities (2,3,4). I think the current iteration strikes the right balance.

Alternatively F-order support could be injected by users on-demand.

How would that work?

It would have to be like the registry of codecs that we have. Something like the following code would be how users use it:

import zarr from "zarr";
import {fOrderSupport} from "zarr/extensions";

zarr.register(fOrderSupport);
...

And then in the code that calls the conversion function it would check whether the extension was registered or not.

In my opinion that amount of complexity is not worth it for a feature like this, I like the PR as it stands now.


A thought I had was that perhaps we could consider somehow supporting monkeypatching/registering additional dimensionalities.

Instead of the if/else to determine which function to use there could be an object that would be indexed:

const fColConverters: {[dim: number]: (a,b,c) => void  }= { 
 0: () => 0, // No op
 1: () => 0, // No op
 2: convertColMajorToRowMajor2D
 3: convertColMajorToRowMajor3D
 4: convertColMajorToRowMajor4D
}

function convertColMajorToRowMajor(src, out, shape) => { 
  return (fColConverters[dim] || convertColMajorToRowMajorGeneric)(src, out, shape);
} 

function registerConverter(dim, c) => {
  fColConverters[dim] = c;
}

In that setup if I have a 6-dimensional F-ordered dataset I could register my own converter for that. A voice inside tells me YAGNI (You ain't gonna need it). I don't know.. what do you think?

normanrz commented 2 years ago

I wouldn't recommend exposing a way to register other converters:

Screenshot 2022-07-15 at 08 43 19
normanrz commented 2 years ago

Fixed the noop :-) If the CI succeeds now, the PR is ready-to-merge from my side.

normanrz commented 2 years ago

Thanks for merging! Are you releasing an updated package sometime soon?

gzuidhof commented 2 years ago

Just published 0.5.2 :)

manzt commented 2 years ago

Thanks for the contribution!!