Closed 0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q closed 11 months ago
For one thing, there is no point in trying to compete with R code against highly optimised C code in terms of performance. The other thing is that my solution pivots the periods to long format before the filtering, whereas yours only pivots five lines of data. The way around that would be to filter periods and everything else differently, but that is quite messy. And in my opinion not worth the headache. Might be more useful to load snapshots into an SQL data base and query that instead of reading files.
In order to process large data sets, like IIASA data base snapshots,
read.quitte()
reads provided files (other then Excel files) in chunks ofchunk_size
lines, and appliesfilter.function()
to the chunks. This allows for filtering data piece-by-piece, without exceeding available memory.filter.function
is a function taking one argument, a quitte data frame of the read chunk, and is expected to return a data frame. Usually it should simply contain all the filters usually applied after all the data is read in. Suppose there is a filebig_IIASA_snapshot.csv
, from which only data for the REMIND and MESSAGE models between the years 2020 to 2050 is of interest. Normally, this data would be processed asIf however
big_IIASA_snapshot.csv
is too large to be read in completely, it can be read usingclose #72