Open cmungall opened 7 years ago
Have you looked at Dat yet? https://datproject.org/ I know the people/non-profit behind it.
Good addition!
Thanks!
I forgot to add github itself. We're very happy using github for data VCS and packaging for smaller artefacts like ontologies, but for files over the 100M limit we found git-lfs wanting.
hand't heard of datalad. Lots going on in this area
Not sure this belongs in dipper tracker, but for want of a better place.
There are some emerging efforts that aim to treat data as code. Common themes are:
Some of these are quite trivial; but you could say github is a trivial web interface on top of git but developers obviously love it
Of course this overlaps with https://www.w3.org/TR/hcls-dataset/ but AFAICT that hasn't taken off, there is no tooling associated with it.
It's also similar to my https://biodatasets.github.io/mybiocaddie/about/ project.
We should take a look at these with two perspectives
If the answer is yes, which improvements/changes would we want?
Frictionless
This seems fairly lightweight and open: http://frictionlessdata.io/
It doesn't provide any storage, it's just some standards about how you mark up and bundle a csv, and some simple tools to help bundle or consume bundles.
It all seems well thought out, but they have zero examples on the site which is frustrating. I want to be able to search for data packages. Of course this is harder as they don't centralize which is arguably good. More like git for data than github for data.
Seems to be managed by a non-profit.
csv and json seems to be privileged. But it seems RDF would also work.
Quilt
This is similar and seems to give you a bit of extra abstraction, but seems quite tied to python at the moment:
https://quiltdata.com/
It's kind of more like an npm for data
I made this package of some globi interactions: https://quiltdata.com/package/cmungall/dinosaur_biotic_interactions
seems much more centralizing
data.world
This is really slick, but requires logging in -- hmmm, looks like they are trying to make a giant silo they can monetize?
https://data.world
osfclient
"A scholarly commons to connect the entire research cycle"
seems a bit broader in scope, like github protocols.io everything rolled into one
pachyderm
This has an emphasis on containers and pipelines http://pachyderm.io/
less relevant from a dipper perspective but worth paying attention to