stateofindiasbirds / soib_2023

SoIB version 2 code
MIT License
0 stars 3 forks source link

Optimise storage and processing time during data subsampling #12

Closed rikudoukarthik closed 5 days ago

rikudoukarthik commented 2 months ago

Currently, the data subsampling process produces individual datasets as CSVs. The storage taken up by these datasets (~2 TB), and the computation time required for this, can both be minimised drastically by switching to RData files.

From experience with the COVID analysis (the required changes committed here), this switch can reduce file size to 10% of the original, and reduce computing time to 10-20% of original.