Open kcstringer opened 3 months ago
Hi @kcstringer,
It is not a good practice to have a large number of jobs. Too many jobs would make the scheduler busy and out of response, so admin usually prohibit the users from running this.
Your submission would result in: very long wait time, very short runtime if the admin allow this.
You can reduce the number of job, by the loop in your R script. Take thousands or even more into one job, doesn't take long time to finish.
Regards, Zhili
I have a data of 7001 columns with column 1 being ID and all subsequent columns each represent one trait. I want to use R to calculate kendall's rank correlation for each pair of columns and extract correlation estimates and p values. The total number of calculations is 7000*6999/2 = 24496500.
I want to parallelize the task such that each array does the calculation for one of the 24496500 column pairs. However, I cannot specify -array=1-24496500 since 24496500 exceeds the QOSMaxSubmitJobPerUserLimit. The QOSMaxSubmitJobPerUserLimit in my institution is 20000. So I write a for loop in bash to submit qsubshcom jobs in batch with each batch having 20000 job arrays. Each batch will be RUN after the previous batach has been completed running.
Here is my
parKD.R
script:Here is the
parKD.sh
script that does the calculation for each column pair:Finally, here is the
run_parKD.sh
file that submit qsubshcom jobs in batches with batch size of 20000 as limited by institution's QOSMaxSubmitJobPerUserLimit:For
run_parKD.sh
, if I settotal_jobs=7
andjobs_per_batch=2
, the job will be run in the way I want, i.e., run first batch with 2 job arrays, when this compltes, run the second batch, which when completes, will run the 3rd batch...However, when I set
total_jobs=24496500
andjobs_per_batch=20000
, here is the first several lines I get in the console:The trouble here is that only the first batch is successfully submitted. Since the batch size is 20000, which already reaches QOSMaxSubmitJobPerUserLimit, then the second and every subsequent batch cannot be submitted. My
run_parKD.sh
fails in this case in that what this script does is to first submit all jobs for queueing, and execute the jobs sequentially. But when batch 1 already uses up QOSMaxSubmitJobPerUserLimit, all other batches cannot be submitted for queueing.If I specify each job manually by submitting a new job array when the previous job has finished, I have to do this 1225 times... ... that is impossible to do so.
My question is how to submit job arrays through qsubshcom such that all 24496500 calculations can be submitted only once with the constraint that QOSMaxSubmitJobPerUserLimit is only 20000.
Thank you.
Kieran