traitecoevo / data_versioning

An approach for practical and simple data versioning in R
10 stars 1 forks source link

Prior work #5

Closed richfitz closed 7 years ago

richfitz commented 9 years ago

There's a lot of prior work in this space, and @cboettig tells me that people have tried this approach and ended up in a mess. Do we solve this problem? Or are we working with data that will always be simple enough to not get in a quagmire? How do people tell when they need to move to something more heavyweight?

cboettig commented 9 years ago

Just for context: my understanding is that when people have tried this approach, they kept getting hammered with questions like #4 where it wasn't easy to capture the relationship between datasets, and most people didn't care about those relationships anyway. Worse, trying to put those version relationships in the identifer / url that others used to access/cite the data meant that users had to try and understand the whole versioning semantics in order to access/cite the right thing; which wasn't as simple as it might sound. people cite the original version but use a later version or vice versa, making a big mess.

The conclusion was to give up. If the creator of a dataset cares to describe how their data is somehow a 'version' of an earlier dataset, by all means they can do so in the data file itself (in whatever way they see fit for their use case -- be that RDF triples or a plain text readme), but at least don't burden the download mechanism with that extra information; that's hard enough as it is.

This is related but not the same problem as the road that people went down with nice pretty identifiers themselves that ended up just being confusing in the end. of course http://host.site/my-name/my-data/v1 is preferred by everyone on the planet to some string no one can remember. but the test of time hasn't been kind to those identifiers. Still, you can have something DOI-like (partial redirect URLs) without being so ugly, e.g. you can set up your own Persistent Uniform Resource Locators at http://purl.org to at least capture some of the link-rot fighting capacity of DOIs. e.g. http://purl.org/cboettig can always be redirected to point to my homepage regardless of where I move it.

wcornwell commented 8 years ago

see http://jabberwocky.weecology.org/2015/11/23/trait-databases-what-is-the-end-goal/

richfitz commented 8 years ago

Also the OKFN data package idea http://data.okfn.org might be worth merging in here, especially as we can fill most of the required data fields from the DESCRIPTION file.