saforem2 / ezpz

Train across all your devices, ezpz 🍋
https://saforem2.github.io/ezpz/
MIT License
9 stars 2 forks source link

fix jobenv does not exist when passing single argument #19

Closed rayandrew closed 2 months ago

rayandrew commented 2 months ago

Hi Sam, I am Ray from Stormer I/O Optimization project.

I am using the new version to limit number of nodes when spawning number of nodes more that I needed for my experiments (especially dealing with bad nodes for example).

I have passed

run() {
   nodes=$1

   head -n $nodes "${PBS_NODEFILE}" > ${PROJECT_ROOT}/.tmp/ray-nodelist-$nodes

   cat ${PROJECT_ROOT}/.tmp/ray-nodelist-$nodes

   source ${VENV_DIR}/lib/python3.11/site-packages/ezpz/bin/savejobenv ${PROJECT_ROOT}/.tmp/ray-nodelist-$nodes
   source ${VENV_DIR}/lib/python3.11/site-packages/ezpz/bin/getjobenv ${PROJECT_ROOT}/.tmp/ray-nodelist-$nodes

   ...
}

run 8

However, the script throws error saying that jobenv_file does not exist thus making the job failed. I fixed it in this pull request by adding one line in save_pbs_env function. let me know if it works for you as well!

Thanks!