Discussion: Can/should we be able to import a dataspiced dataset from the web/elsewhere?

ropensci / dataspice

:hot_pepper: Create lightweight schema.org descriptions of your datasets

https://docs.ropensci.org/dataspice

Other

159 stars 26 forks source link

Discussion: Can/should we be able to import a dataspiced dataset from the web/elsewhere? #57

Open amoeba opened 6 years ago

amoeba commented 6 years ago

This comes from a good question in my dataspice demo today: If user X authors a dataspice page for their dataset, and another scientist, Y, wants to use it, it'd be cool if they just ran:

import_spice("https://amoeba.github.io/some-dataset")

And their computer downloaded something like some-dataset.zip which had the dataspice.json and the files described in access.csv attached to it somehow.

khondula commented 6 years ago

That seems cool! Would that depend on the reliability/persistence of 'contentUrl' or 'contentUrl' + 'fileName'?

Would there maybe be a way to generate a .bib as well, to suggest a citation?

cboettig commented 6 years ago

👏

I think it might be potentially more robust to have a function that just extracts the metadata and returns an R object which contains the download urls? e.g. something like

x <- import_spice()
read_csv(x$files[[1]]

(Some examples of schema.org Dataset contentUrls do not contain direct links to download a data file, but rather a web page that has links).

Could potentially make this behavior part of a read_spice() function; i.e. read_spice could work locally on a dataspice.json object or could extract dataspice.json from HTML content on the web.

An R object could also contain the citation (perhaps as an R bibitem object, which R can already turn into either bibtex or text-based citation). i.e. simply x$citation; or we could have a methods-y interface like citation(x)

amoeba commented 6 years ago

@khondula wrote:

That seems cool! Would that depend on the reliability/persistence of 'contentUrl' or 'contentUrl' + 'fileName'?

Yes, I see it as a huge need to resolve this stuff soon. @cboettig 's idea below helps alleviate that (don't fetch the data at first, just metadata) then give the user a way to fetch some or all of it.

returns an R object

Ooh nice! More robust yes.

Could potentially make this behavior part of a read_spice() function; i.e. read_spice could work locally on a dataspice.json object or could extract dataspice.json from HTML content on the web.

👍 and 👍 on all those ideas @cboettig