Open sckott opened 10 years ago
@sckott we support code and data (either by hosting it on our registry or pointing to it when it's hosted on another registry, like genbank, for example). Our stack is in node so streaming is good and we can handle data big and small. We store data on S3 and the metadata/indexing on couchdb (via cloudant) for now. Sorry if that wasn't clear! We will definitely have to think about whether we really want to be hosting really large data, very good point :) Ideally we just store the metadata for this and point to it wherever it lives
thanks @tiffbogich - Not sure where data is included though. For example, in the example call
RJSONLD.export(lm(iris$Petal.Length~iris$Sepal.Length), path = "irisLM.jsonld")
The output doesn't include the iris
dataset in the output jsonld file. Is there an option to include it?
Hi Scott,
Thanks for your question. To complete what Tiff answered in the context of RJSONLD more specifically, this package is targeted to objects that are created and live in R, and have no standard way to be exported and shared on the web, like analysis results. It is also able to generate a JSON-LD object out of the analysis results because all of the semantic is already there, we just change the format and make things explicit/standard in cases where R relies on some implicit/ad-hoc descriptions (contrasts in ANOVA's, for example).
For data, it can generally be exported as a CSV, for which more generic tools can handle (ldpm, for example). Also, a dataset per se lacks semantic information on what it represents, so RJSONLD would not do much more than RJSONIO. To solve that, ldpm has a wizzard that asks the user for general meta-information. To make that process easier, we're working on a graphic interface too.
Regarding your second question, the JSON-LD does not contain the data. The way we see this is that the irisLM.jsonld
file would have a ìsBasedOnUrl
mention pointing at the iris data. For this, you need the data to have a url.
I could add an option to integrate such url's as options of the call to RJSONLD, actually. You simply need to give your data a url, which is exactly what we're trying to do with ldpm and our website.
For metadata: Any plans to handle arbitrary objects (seems that mostly statistical modeling output objects are handled now)? For example, if a user has a data.frame
that holds 10 columns and 100 rows, could RJSONLD
allow a user to easily specify metadata for each column and the entire dataset as a whole to go into the jsonld output (and the url for the dataset itself as you mentioned). I guess at least the schema you refer to on your readme deals specifically with statistics though, so perhaps metadata for datasets is out of scope?
On data: Cool, sounds good to reference with ìsBasedOnUrl
For the moment, data frames / tabular data can be handled with a two-steps process:
The focus in RJSONLD has so far been on generic objects that meet the follwing two criteria:
If you have ideas of such objects, suggestions (or pull requests) are welcome!
hmm, geoJSON
(for spatial data in json format) comes to mind, though I guess rgdal
package has writeOGR(...)
to write out .geojson
files
Curious if you plan on supporting not just the analysis results, but the data as well? Seems right now like this supports only analysis results. Maybe I'm missing the data part, or perhaps including support for data is opening up a big bag since you could be dealing with GB/TB of data?