Checking for incomplete designs in large simulations

philchalmers / SimDesign

Structure for organizing Monte Carlo simulations in R

61 stars 18 forks source link

The current way of checking for incomplete designs in distributed simulations is as follows:

Final <- SimDesign::aggregate_simulations(files=dir())
pick <- subset(Final, REPLICATIONS < 10000)
subDesign <- subset(pick, select=N)
replications_missed <- 10000 - pick$REPLICATIONS

This consumes a lot of memory and might not be a feasible solution for very large simulations that might produce several gigabytes of results. I propose a new function called check_missing_simulations(), which would load each file, store the replication count, and then discard the actual simulation results before loading the next result file.

philchalmers / SimDesign

Checking for incomplete designs in large simulations #36