traitecoevo / datastorr

Simple data versioning and distribution
https://docs.ropensci.org/datastorr
Other
67 stars 4 forks source link

Integrate with OKFN data packages #2

Open richfitz opened 8 years ago

richfitz commented 8 years ago

Add a packages.json file that contains metadata information probably satisfies most of the requirements.

richfitz commented 8 years ago

Here's the website with a bit more information http://data.okfn.org/doc/data-package

Importantly, this can be additional to what we currently have and allow better interopability. I don't believe there is good R tooling for dealing with datapackages yet though.

wcornwell commented 8 years ago

So like for taxonlookup we take what's now in the github readme.md and put it in a .json file? I guess ideally it would also be in the R documentation also? I guess we need a system where the meta-data in one place (the json file?) is canonical and the other 2 are generated?

dfalster commented 8 years ago

I really like the idea of the OKFN data packages, so in principle it would be great to support them. Depends how much work it is. Seems low cost.

Generating the readme from a single canonical source for metadata shouldn't be too hard. I tried something like this a while back, where i used a json file with metadata to write the readme. (see readme.Rmd in github.com/dfalster/Falster_2005_JEcol_data). Now I know there is now an actual preferred for that metadata.

richfitz commented 8 years ago

Yeah, this is not too much work now that I have the automatic uploading thing worked out. We'd just hook into the same set of routines.

I think I'd opt to put the json in with the releases themselves, and have the URIs in the release json resolve to the github release URIs. So for taxonlookup it would read:

{
  "name" : "traitecoevo/taxonlookup",
  "title" : "A dynamically-updating versioned taxonomic resource for vascular plants",
  "license" : "CC0",
  "sources" : [{
    "name": "The plant list",
    "web": "http://www.theplantlist.org"
  }],
  "author": "Will Cornwell <wcornwell@gmail.com>",
  "contributors": [
    "Will Cornwell <wcornwell@gmail.com>",
    "Rich FitzJohn <rich.fitzjohn@gmail.com>",
    "Matt Pennell <mwpennell@gmail.com>"
  ],
  "version": "1.0.2",
  "resources": [{
    "url": "https://github.com/traitecoevo/taxonlookup/releases/download/v1.0.2/plant_lookup.csv",
    "name": "plant_lookup",
    "format": "csv",
    "hash": "sha1:cf6bb45eed09973d599e97fa8a6b8234c084e52a"
  }]
}

as you can see most of that is gettable from the DESCRIPTION file, so that's easy enough.

dfalster commented 8 years ago

Looks good.

On Tue, Jan 12, 2016 at 8:20 PM, Rich FitzJohn notifications@github.com wrote:

Yeah, this is not too much work now that I have the automatic uploading thing worked out. We'd just hook into the same set of routines.

I think I'd opt to put the json in with the releases themselves, and have the URIs in the release json resolve to the github release URIs. So for taxonlookup it would read:

{ "name" : "traitecoevo/taxonlookup", "title" : "A dynamically-updating versioned taxonomic resource for vascular plants", "license" : "CC0", "sources" : [{ "name": "The plant list", "web": "http://www.theplantlist.org" }], "author": "Will Cornwell wcornwell@gmail.com", "contributors": [ "Will Cornwell wcornwell@gmail.com", "Rich FitzJohn rich.fitzjohn@gmail.com", "Matt Pennell mwpennell@gmail.com [aut]" ], "version": "1.0.2", "resources": [{ "url": "https://github.com/traitecoevo/taxonlookup/releases/download/v1.0.2/plant_lookup.csv", "name": "plant_lookup", "format": "csv", "hash": "sha1:cf6bb45eed09973d599e97fa8a6b8234c084e52a" }] }

as you can see most of that is gettable from the DESCRIPTION file, so that's easy enough.

— Reply to this email directly or view it on GitHub https://github.com/richfitz/datastorr/issues/2#issuecomment-170848174.

wcornwell commented 8 years ago

I agree, the specific meta-data for the columns might take a bit of organizing...

BTW, I like the new datastorr release feature. Worked the first time.

richfitz commented 8 years ago

The column specific meta-data is someone else's problem, I think. Not all the data stored this way will be tabular, in any case. So as long as there's a facility for including it (most trivially a json file somewhere in the repo that would get slurped in).