Currently, the data subsampling process produces individual datasets as CSVs. The storage taken up by these datasets (~2 TB), and the computation time required for this, can both be minimised drastically by switching to RData files.
From experience with the COVID analysis (the required changes committed here), this switch can reduce file size to 10% of the original, and reduce computing time to 10-20% of original.
Currently, the data subsampling process produces individual datasets as CSVs. The storage taken up by these datasets (~2 TB), and the computation time required for this, can both be minimised drastically by switching to RData files.
From experience with the COVID analysis (the required changes committed here), this switch can reduce file size to 10% of the original, and reduce computing time to 10-20% of original.