pangeo-data / ml-workflow-examples

Simple examples of data pipelines from xarray to ML training
Apache License 2.0
22 stars 10 forks source link

Xarray wishlist for ML #3

Open jhamman opened 5 years ago

jhamman commented 5 years ago

@mrocklin has started an impromptu xarray feature wishlist focusing mainly uses of Xarray for machine learning. It would be great to get some input from this group on the subject. There is an editable google doc here.

This issue can also be used for some discussion on the topic.

jhamman commented 4 years ago

@choldgraf pointed me to Nilearn, a package for machine learning for Neuro-Imaging in Python. Seems like an interesting example of how a specific domain has built convenience tools around ML workflows.

choldgraf commented 4 years ago

Nilearn integrates heavily with scikit-learn, the simplest explanation for it is that it basically knows how to go from N-dimensional brain-relevant shapes to 2-dimensional "samples by features" shapes and back again. Over time it's also built up a bunch of convenience visualization, I/O, etc functions as well.

Another package from the neuro world (that I'm a bit more familiar with) is a tool called MNE-Python, which focuses on electrophysiology as opposed to MRI. This has a few internal data structures (that I suspect would be XArrays if it were to be created now) and has some convenience functionality for Machine Learning too - in this case, it's more about exposing some common model fitting techniques in electrophysiology, such as receptive fields and supervised learning using sensor data.

When I was last working on this project in-force, we were starting to put together classes that performed neuro-relevant operations but behaved like scikit-learn preprocessors (and so could be included in pre-processing chains). Perhaps that could be inspiration as well - here's a link to some ML examples if they're helpful.

I suspect a lot of that functionality could be simplified if we had a metadata-rich data structure like XArray under the hood (if I ever get a chance to be a neuroscientist again, I suspect this will be a project I'd like to try out)