sensebox / opensensmapR

R client for opensensemap.org
https://noerw.github.io/opensensmapR/inst/doc/osem-history
8 stars 5 forks source link

Allow storing of downloaded data and expose parsing function #13

Closed nuest closed 6 years ago

nuest commented 6 years ago

For the sake of completeness (or in case the openSenseMap-API disappears some day) I would like to keep a local copy of the files but keep the required change to the functions using the local data minimal compared to using the data straight from the API

Here's how I did it: https://github.com/nuest/sensebox-binder/commit/3840d44427e674a61f14e8371b40eff72e51b2ae

I had to manually re-add the class to utilize the included plotting functions. It would be nice if the parsing function (i.e. including the column types) could be exposes so that a user can

  1. let opensensmapR create a local file with the downloaded data, e.g. osem_measurements(..., saveToFile = "copy.csv")
  2. load measurements from that file later giving me exactly what I would have gotten from the API, e.g. osem_measurements(x = "copy.csv", ...)
noerw commented 6 years ago

Wouldn't serialization through an .RData or .rds file containing the data.frames from osem_measurements() / osem_boxes() achieve the same result?

senseboxes <- osem_boxes()
saveRDS(senseboxes, 'senseboxes.rds')
senseboxes_copy <- readRDS('senseboxes.rds')`

I guess the question regarding reproducibility is, where the data-processing begins. osem_measurements() only does data transformation, which I don't consider processing, so to me there is no need to expose/serialize an intermediate data format. We can't capture the 'raw' data anyway, because technically 'processing' starts on the oSeM server already.

Or am I missing something?

nuest commented 6 years ago

RData is binary, and I worry about the day that R does not work anymore. A plain text format like JSON is much better then. Also, it allows others to analyse the data in a tool they want (Python, JavaScript) much easier (transparency).

noerw commented 6 years ago

I don't think serialization is the job of this package. There are the functions osem_as_sensebox() and osem_as_measurements(), which apply all necessary classes and attributes.

With those functions its a one-liner:

library(opensensemapr)
library(magrittr)
library(jsonlite)
library(readr)

all_boxes = osem_boxes()
plot(all_boxes)

# 1-to-1 mapping of data to json
toJSON(all_boxes) %>% write(file = './boxes.json')
fromJSON('boxes.json') %>% osem_as_sensebox() %>% plot()

Even better IMO is the serializeJSON() function from jsonlite

# far more compact while maintaining attributes. still human-readable!
serializeJSON(all_boxes) %>% write(file = './boxes2.json')
read_file('boxes2.json') %>% unserializeJSON() %>% plot()
nuest commented 6 years ago

I simply did not come up with the idea of using the osem_as functions, so maybe it's just a matter of documentation. Thanks for the pointer.

noerw commented 6 years ago

Indeed, I should add those to the docs :+1: