Open luisoala opened 2 days ago
Hi Luis,
That could be very interesting!
I was prepping this page as a first draft to share with @cabrerac and @apaleyes (they're now seeing this for the first time!).
It's part of our wider work around data oriented architectures.
https://arxiv.org/pdf/2302.04810
So there's definitely synergy between what we're looking at and standardisation of data sets.
It's derived from workflows I've got, not just for ML but for general data processing, they're implemented in
https://github.com/lawrennd/lynguine (not the y instead of i).
and
https://github.com/lawrennd/referia
Neil
neil
first of, apologies for misspelling your name :sweat:
this looks amazing!
ill try to use it to get a better sense of the workflow
overall, the broad scope for a general abstraction for data processing, whether ml or not, sounds great, especially wrt composability
i dont want to interfere w your momentum
but if you are interested in synergies, here are a few low stakes ideas in order of complexity
1) you present your vision to the croissant group (usually meet wednesday afternoon european time)
2) omar, joaquin, isabelle, i and few other folks have started sketching out a scope for the "croissant tasks" concept https://docs.google.com/document/d/1cQ2nQvP4WXyd2AaOmVZoO_URn7whv09PWeLeScrFzEQ/edit?usp=sharing (you might need to request access). if this gels w your ambition we could see if we find a way to push something forward together. an esoteric sketch attached
3) we have been planning to hold a small composable data systems workshop in paris (before neurips or in the new year). this could be a good opportunity to explore some of these themes inperson indetail
hi neal
just saw this repo pop up in my gh feed
i was wondering if youd be interested to explore synergies w croissant? https://github.com/mlcommons/croissant
we have also started exploring a more expansive scope of data interfaces, dubbed "tasks" for now, that go beyond description of just data to include more context how the data is to be used (e.g. w a sample model, metric, baseline score for the metric)