related-sciences / gwas-analysis

GWAS data analysis experiments
Apache License 2.0
24 stars 6 forks source link

PyData prototype backend dispatching #24

Open eric-czech opened 4 years ago

eric-czech commented 4 years ago

Two immediately necessary uses for this are:

The second is far more complicated and a separate framework may not be necessary for the first, but it would be great to support both the same way.

On array dispatch, I think dispatching based on argument types is not enough. We will likely have many functions that take multiple array args and if they are a mix of dask/numpy/sparse arrays, a better solution to supporting this is likely to have the user declare what backend API should be preferred and then special-case coercion where necessary.

At a minimum, I think we should keep CuPy, Dask, and Numpy backends in mind since we already know how different implementations of genetic algorithms are going to be based on Alistair's skallel v2 prototype. Each backend will definitely need to use API-specific functionality but a lot of operations will be dispatchable purely through the numpy API too. A good question to answer would be whether or not literally using numpy is better for the latter or if unumpy will make more sense. The backend dispatching model in unumpy seems like a good fit but I don't know if aligning to this long-term is worth the extra dependencies. I think it will depend on how much non API-specific code we actually need.

eric-czech commented 4 years ago

cf. this thread on dispatch in Xarray: https://github.com/pydata/xarray/issues/1938

hammer commented 4 years ago

Some interesting discussion also happening at https://github.com/pydata/xarray/issues/3213#issuecomment-615772303 with regards to scipy.sparse and pydata/sparse, which may be "backends" to consider as well.

eric-czech commented 4 years ago

Some more notes/questions: