radical-collaboration / hpc-workflows

NSF16514 EarthCube Project - Award Number:1639694
5 stars 0 forks source link

Launch multiple job in one script #120

Closed wjlei1990 closed 4 years ago

wjlei1990 commented 4 years ago

Hi, is it possible to launch multiple jobs at once using just one script?

So currently, we are planning to launch a huge batch of small simulations on summit. Each task (one simulation) will take 1 node on summit and one task will take about 10 mins. We are planning to launch 10,000 to 20,000 tasks in the near future. I am planning to use 100 nodes.

So we can't just put all the tasks into one job since it may hit the walltime. I am wondering if I cat split the 10,000 tasks into multiple lsf jobs, what is the best way to do that? Can I have one master python script that split the job and launch multiple entk instance? Will multi-processing work, meaning we assign one entk instance to one process?

andre-merzky commented 4 years ago

@wjlei1990 : you can submit 20k tasks directly to EnTK or RP, and we take care of execution coordination. That is a number we can handle rather well. If runtime is a problem, please consider to increase the pilot size, you can go significantly higher than 100 concurrent tasks. When using 200 nodes, runtime already goes down to approx 2 hours which the batch system should accept AFAIU, but you should be able to increase further really.