ucsf-wynton / wynton-tools

0 stars 0 forks source link

why/job: Look for SGE options in `job_args` #14

Open HenrikBengtsson opened 1 month ago

HenrikBengtsson commented 1 month ago

If qstat -j <job> shows:

job_args:                   -l,h_rt=00:30:00,mem_free=2G,gpu_mem=1G

it suggests that an incorrect qsub call was made. A reproducible example (from 2024-10-09 Slack thread):

I'm trying to reproduce this, and what I think happened is that you specified -l ... after the job script, e.g.

qsub -cwd -j y my_script.sh -l h_rt=00:30:00,mem_free=2G,gpu_mem=1G                                            

but you need to specify it before, i.e.

qsub -cwd -j y -l h_rt=00:30:00,mem_free=2G,gpu_mem=1G my_script.sh

The reason is that SGE/qsub stops parsing command-line options as soon as it reaches the job script argument (script.sh). Anything following, it will record (job_args) and pass to the job script as-is, i.e. it will run your script as if you'd call it as manually:

script.sh -l h_rt=00:30:00,mem_free=2G,gpu_mem=1G                                            

So, that's why -l ... is not used by SGE.

HenrikBengtsson commented 1 month ago

Scanning 8,115 jobs currently on the queue using:

for job in "${jobs[@]}"; do job_args=$(qstat -j "${job}" | grep -E "^job_args" | sed -E 's/job_args:[[:blank:]]+//'); if [[ -n ${j
ob_args} ]]; then printf "%s: %s\n" "${job}" "${job_args}"; fi; done

reveals only a few such mistakes, e.g.

...
2162260: h_rt=00:30:00
...
3613495: MSN,-l,h_rt=50:00:00
...
3624537: l,h_rt=00:30:00,mem_free=2G,gpu_mem=1G
...
3624546: l,h_rt=00:30:00,mem_free=2G,gpu_mem=1G
3624547: -l,h_rt=00:30:00,mem_free=2G,gpu_mem=1G
3624548: -l,h_rt=00:30:00,mem_free=2G,gpu_mem=1G
3624555: -l,h_rt=00:30:00,mem_free=2G,gpu_mem=1G
...