lhilbert / EnhancerShoeBox

Molecular Dynamics Simulations of Enhancer-Promoter Interactions in a Model Shoe Box
MIT License
0 stars 0 forks source link

Redundant arguments in run_single_shoebox.py script #3

Open ezingrebe opened 4 months ago

ezingrebe commented 4 months ago

An example to run a single gene-cluster simulation without actin is:

python run_single_shoebox.py -b 11 -r 1 -t 10 -o gene_cluster/ -m 1 -c Control -p 3 -a 30 -x 80 -z 0 -n 0.01

I think that the following two parameters parsed to the script are redundant:

-r: repeat number -t: total number of repeats for the parallel runs

since the script run_single_shoebox.py can run only one single and independent simulation. Parsing some random values of "r" and "t" here wouldn't influence the simulation output. To run multiple simulations in parallel, one has to run:

./run_parallel_shoebox.sh 1 100 11 Control

where the first two arguments, 1 and 100 (in this case), correspond to the "r" and "t" arguments accordingly.

lhilbert commented 2 months ago

The bash script run_parallel_shoebox.sh uses both arguments and passes them to the actual python script to execute. I presume that the single run python script then uses the total number of simulations to manage its output in a correct fashion given the overall number of simulations that are required. I suspect there is some sort of management or data housekeeping reason that this additional information needs to be passed to the single job python script. Additionally, it allows running only parts of the overall set of parallel jobs, so for example running only job 4-8 out of 10 jobs. It should allows reliable flexibility, I think.

lhilbert commented 2 months ago

So, as expected, the argument r is converted into the variable run_numer in line 258 of the single run script, and subsequently used to define where exactly the results from a specific repeat should be saved. This can be seen in one example on the line 168, in the function starting on line 155.

Th number of total simulations (total_runs) is used on line 796, but it seems it is "for convenience" as it is only used for command line output reporting how many simulations are to be completed overall. I would still maintain that output, it is programmatically very useful when running these simulations.