philchalmers / SimDesign

Structure for organizing Monte Carlo simulations in R
http://philchalmers.github.io/SimDesign/
61 stars 18 forks source link

Checking for incomplete designs in large simulations #36

Closed mronkko closed 7 months ago

mronkko commented 7 months ago

The current way of checking for incomplete designs in distributed simulations is as follows:

Final <- SimDesign::aggregate_simulations(files=dir())
pick <- subset(Final, REPLICATIONS < 10000)
subDesign <- subset(pick, select=N)
replications_missed <- 10000 - pick$REPLICATIONS

This consumes a lot of memory and might not be a feasible solution for very large simulations that might produce several gigabytes of results. I propose a new function called check_missing_simulations(), which would load each file, store the replication count, and then discard the actual simulation results before loading the next result file.

philchalmers commented 7 months ago

Reasonable request, though an extra function is uncessary. You can now do this via aggregate_simulations(files, check.only=TRUE), which will return the problematic designs and their replication count.