sapfluxnet / sapfluxnetr

Working with SAPFLUXNET Project data
Other
24 stars 2 forks source link

Trimming the fat in sfn_data objects #9

Closed MalditoBarbudo closed 5 years ago

MalditoBarbudo commented 6 years ago

As the sfn_data class stands now, each object, especially the biggest ones, occupy a lot of memory. There are some things that can be done to reduce the object memory footprint:

  1. Store internally the sapf_data and sapf_flags as an array (methods will still returning data frames when asking for the data or the flags, but in this way the size of the object can be reduced considerably. The same goes to the env_data and env_flags

  2. Convert the S4 class to R6 class: This will pay off at the end, as R6 are faster to access, specially in shiny apps. Also, the allow for modify-in-place (without copy in memory, but one must be very careful with this, though), which I think could be used in the metrics processes to reduce even more the memory footprint of all the process. This is independent from the first point.

I will test the first point, as it will not change anything in the package functions, to check if it is worthy. The second point, I would like to do it, but I don't know if is easily doable. We'll see.

MalditoBarbudo commented 6 years ago

It seems that the first point is not valid, size of the sfn_data can not be reduced by transforming the data in an array. Nevertheless, the second point is still valid, so I created the R6_play branch where I will literally play with this kind of classes to see what can be done.

Benefits/ideas to develop with R6 classes
  1. Different generation methods. Right now we have the objects created in the QC3 level, which are stored in the db folder and distributed. But the idea, in a future, is to have an SQL database. With the R6 classes we can have alternative methods to retrieve the data

    • one which calls the SQL server and populate the data from there. This virtually render the size of the object to 0, as the data will be retrieved from the SQL
    • another which have R objects locally, which allows for offline work and also creation of custom sfn_data objects (user has new data that wants to incorporate to the analisys)
  2. Updating the local user database on the fly. The user download the data once. After that, when a new database version is available, the user can update the objects individually or collectively and save the RData files. This can come in handy to avoid large downloads, because of the mutability of the R6 classes. We can take benefit from it to develop an update method that automatically fetch the new version from the SQL db, update the object, and save it in the specified folder. (That's it, we can have a rolling release database!!)

    • think of the advantages here to hot-fix a site after db publication (i.e. ESP_VAL is missing something or some data is not correct, but we fail to fix it before db publication)
  3. ... (:cliche: the imagination is the limit :cliche:).

MalditoBarbudo commented 5 years ago

This is gonna be a summer project or something like that, but no now :disappointed: I close this