huilisabrina / covid-19-simul

Efficient simulation and graphical modeling of Covid-19 spread
1 stars 0 forks source link

Cluster-Level Parallelization #6

Open smwu opened 4 years ago

smwu commented 4 years ago

Hi @huilisabrina @beancamille @intekhab8!

The initial implementation of cluster-level parallelization is currently underway and being tested by @huilisabrina!

Here is a brief description of the implementation:

For a single set of parameters, the bash wrapper function begins by splitting up the full graph dataset into disjoint clusters corresponding to the subgraphs for the various HIV studies. Then, a simulation of an epidemic is run for each cluster individually, using 'network_update_GF_monte_carlo.py'. Finally, a bash wrapper function combines the results across clusters into one final output file for this set of parameters.

The schema below describes the basic workflow. image

monte_carlo_clusters.bash calls preprocess_network_clusters.py to process the network data into 2*C text files, where C is the number of clusters. More specifically, for each cluster, a vertex set and edge set are outputted.

Once the vertex and edge sets are ready, monte_carlo_clusters.bash reads in the params_input.csv file line-by-line. Each line contains a specific set of input parameters for the simulation. For each set of input parameters, the bash script runs network_update_GF_monte_carlo.py C times, once for each cluster, and then combines the results using combine_cluster.py. In total, the epidemic simulation function is called M*C times, where M is the number of Monte Carlo iterations, and C is the number of clusters.

huilisabrina commented 4 years ago

Okay finally... I've updated the scripts so that they are functional now! I'm still testing the whole pipeline on the cluster. You can see all the changes I've been pushed so far. Hope you don't mind me "reverting" some of your changes. Let me know if you have any questions!

Best, Hui