Closed nuest closed 6 years ago
Wouldn't serialization through an .RData
or .rds
file containing the data.frames from osem_measurements()
/ osem_boxes()
achieve the same result?
senseboxes <- osem_boxes()
saveRDS(senseboxes, 'senseboxes.rds')
senseboxes_copy <- readRDS('senseboxes.rds')`
I guess the question regarding reproducibility is, where the data-processing begins.
osem_measurements()
only does data transformation, which I don't consider processing, so to me there is no need to expose/serialize an intermediate data format.
We can't capture the 'raw' data anyway, because technically 'processing' starts on the oSeM server already.
Or am I missing something?
RData
is binary, and I worry about the day that R does not work anymore. A plain text format like JSON is much better then. Also, it allows others to analyse the data in a tool they want (Python, JavaScript) much easier (transparency).
I don't think serialization is the job of this package.
There are the functions osem_as_sensebox()
and osem_as_measurements()
, which apply all necessary classes and attributes.
With those functions its a one-liner:
library(opensensemapr)
library(magrittr)
library(jsonlite)
library(readr)
all_boxes = osem_boxes()
plot(all_boxes)
# 1-to-1 mapping of data to json
toJSON(all_boxes) %>% write(file = './boxes.json')
fromJSON('boxes.json') %>% osem_as_sensebox() %>% plot()
Even better IMO is the serializeJSON()
function from jsonlite
# far more compact while maintaining attributes. still human-readable!
serializeJSON(all_boxes) %>% write(file = './boxes2.json')
read_file('boxes2.json') %>% unserializeJSON() %>% plot()
I simply did not come up with the idea of using the osem_as
functions, so maybe it's just a matter of documentation. Thanks for the pointer.
Indeed, I should add those to the docs :+1:
For the sake of completeness (or in case the openSenseMap-API disappears some day) I would like to keep a local copy of the files but keep the required change to the functions using the local data minimal compared to using the data straight from the API
Here's how I did it: https://github.com/nuest/sensebox-binder/commit/3840d44427e674a61f14e8371b40eff72e51b2ae
I had to manually re-add the class to utilize the included plotting functions. It would be nice if the parsing function (i.e. including the column types) could be exposes so that a user can
osem_measurements(..., saveToFile = "copy.csv")
osem_measurements(x = "copy.csv", ...)