Open lsawade opened 3 years ago
Ok I can run things on Traverse using this setup. But there are some things I have learnt: ... For some reason, I cannot request a pool of GPUs and take from it.
I am not sure I appreciate the distinction - isn't 'this setup' also using GPUs from a pool of requested GPUs?
Given the first statement (I can run things on Traverse using this setup
), it sounds like we should encode just this in RP to get you running on Traverse, correct?
Well, I'm not quite sure. It seems to me that if I request, #SBATCH --gpus-per-task=1
I already prescribe how many GPUs a task uses, which worries me. Maybe it's a misunderstanding on my end..
This batch script here does not use that directive. The sbatch only needs to provision the right number of nodes - the per_task
parameters should not matter (even if you need to specify it in your case for some reason) as we overwrite them in the srun
directives anyway?
Exactly! But this does not seem to work!
SBATCH -n 4
SBATCH --gpus-per-task=1
srun -n 4 --gpus-per-task=1 a.o
works;
SBATCH -n 4
SBATCH -gpus=4
srun -n 4 --gpus-per-task=1 a.o
does not work!
Unless, I'm making a dumb mistake ...
Sorry, I did not work on this further, yet.
Hi @lsawade - I still can't make sense of it and wasn't able to reproduce it on other Slurm clusters :-( But either way, please do give the RS branch fix/traverse
(https://github.com/radical-cybertools/radical.saga/pull/840) a try. It now hardcodes the #SBATCH --gpus-per-task=1
for Traverse.
Hi @andre-merzky - So, I was getting errors in the submission, and I finally had a chance to go through the log. And, I found the error, the submitted SBATCH
script can't work like this:
#SBATCH --ntasks=32
#SBATCH --ntasks-per-node=32
#SBATCH --gpus-per-task=1
#SBATCH -J "pilot.0000"
#SBATCH -D "/scratch/gpfs/lsawade/radical.pilot.sandbox/re.session.traverse.princeton.edu.lsawade.019013.0000/pilot.0000/"
#SBATCH --output "bootstrap_0.out"
#SBATCH --error "bootstrap_0.err"
#SBATCH --partition "test"
#SBATCH --time 00:20:00
In this case, you are asking for 32 GPUs on a single node. I have no solution for this because the alternative, requesting 4 tasks
seems stupid. And, research computing staff seemed to be immovable in terms of SLURM settings on Traverse.
We discussed this topic on this weeks devel call. At this point we are inclined to not support Traverse: the Slurm configuration on Traverse is contradicting the Slurm documentation, and also how other Slurm deployments work. To support Traverse we basically have to break support on other Slurm resources.
We can in principle create a separate slurm_traverse
launch method and pilot launcher in RP to accommodate the machine. That however is a fair amount of effort. Not insurmountable, but still, quite some work. Let's discuss on the HPC-Workflows call on how to handle this. Maybe there is also a chance to iterate with the admins (although we wanted to stay out of the business of dealing with system admins directly :-/ )
We will have to write an executor specific to Traverse. This will require allocating specific resources and we will report back once we do some internal discussion. RADICAL remains available to discuss the configuration of new machines, in case it will be useful/needed. Meanwhile, Lucas is using Summit while waiting for Traverse to become viable with EnTK.
@andre-merzky
Today I was working on something completely separate, but -- again -- I had issues with Traverse even for an embarrassingly parallel submission. It turned out that there seems to be an issue with how hardware threads are assigned.
If I just ask for --ntasks=5
I will not get 5
physical cores from the Power9 CPU, but rather 4
hardware threads from one core and 1
hardware thread from another.
So, the CPU pool on traverse by default has size 128
. I have to use the following to truly access 5
physical cores:
#SBATCH --ntasks=5
#SBATCH --cpus-per-task=4
#SBATCH --ntasks-per-core=1
I will check whether this has an impact on how we are assigning the tasks during submission.
Just an additional example to build understanding:
This
#SBATCH --nodes=1
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=4
#SBATCH --ntasks-per-core=1
is OK.
This
#SBATCH --nodes=1
#SBATCH --ntasks=33
#SBATCH --cpus-per-task=4
#SBATCH --ntasks-per-core=1
is not OK.
I have confirmed my suspicions. I have finally found a resource and task description that definitely works. Test scripts are located here traverse-slurm-repo, but I will summarize below:
The sbatch
header:
#!/bin/bash
#SBATCH -t00:05:00
#SBATCH -N 2
#SBATCH -n 64
#SBATCH --cpus-per-task=4
#SBATCH --ntasks-per-core=1
#SBATCH --output=mixed_gpu.txt
#SBATCH --reservation=test
#SBATCH --gres=gpu:4
So, in the sbatch
header, I'm explicitly asking for 32 tasks where each task has access to 4 cpus. In SLURM language Power9 hardware threads are apparently equal to cpus. Hence, each physical has to be assigned 4 CPUs. Then, I also specify that each core is only assigned a single task. Finally, instead of specifying somewhere implicitely some notion of GPU need, I simply tell slurm I want the 4 GPUs in each node with --gres=gpu:4
.
If you want to provide the hostfile you will have to decorate the srun
command as follows:
# Define Hostfile
export SLURM_HOSTFILE=<some_hostfile with <N> entries>
# Run command
srun --ntasks=<N> --gpus-per-task=1 --cpus-per-task=4 --ntasks-per-core=1 --distribution=arbitrary <my-executable>
dropping the --gpus-per-task
if none are needed. Otherwise, if you want to let slurm
handle the resource allocation, the following works as well:
srun --ntasks=$1 --gpus-per-task=1 --cpus-per-task=4 --ntasks-per-core=1 <my-executable>
again, dropping the --gpus-per-task
if none are needed.
From past experience, I think this is relatively easy put into EnTK?
@lsawade - thanks for you patience! In radical-saga and radical-pilot, you should now find two branches named fix/issue_138_hpcwf
. They hopefully implement the right special cases for Traverse to work as expected. Would you please give them a try? Thank you!
Will give it a whirl!
@andre-merzky , I find the branch in the pilot but not in saga? Should I just use fix/traverse for saga?
@lsawade : Apologies, I missed a push for the branch... It should be there now in RS also.
Hey @lsawade - did you have the chance to look into this again?
Sorry, @andre-merzky , I thought I had updated the issue before I started driving on Friday...
So, the issue persists. An error is still thrown when --cpus_per_task
is used due to the underscores.
python : /home/lsawade/.conda/envs/conda-entk/bin/python3
pythonpath :
version : 3.7.12
virtualenv : conda-entk
radical.entk : 1.14.0
radical.gtod : 1.13.0
radical.pilot : 1.13.0-v1.13.0-149-g211a82593@fix-issue_138_hpcwf
radical.saga : 1.13.0-v1.13.0-1-g7a950d53@fix-issue_138_hpcwf
radical.utils : 1.14.0
$ cat re.session.traverse.princeton.edu.lsawade.019111.0001/radical.log | grep -b10 ERROR | head -20
@lsawade hi Lucas, can you please give it another try, since that was a typo in option setup and was fixed in that branch, thus the stack would look like this
% radical-stack
python : /Users/mtitov/.miniconda3/envs/test_rp/bin/python3
pythonpath :
version : 3.7.12
virtualenv : test_rp
radical.entk : 1.14.0
radical.gtod : 1.13.0
radical.pilot : 1.14.0-v1.14.0-119-ga6886ca58@fix-issue_138_hpcwf
radical.saga : 1.13.0-v1.13.0-9-g1875aa88@fix-issue_138_hpcwf
radical.utils : 1.14.0
@lsawade : ping :-)
Hi,
I don't know whether this is related to #135 . It is weird because I got everything running on a single node, but as soon as I use more than one EnTK seems to hang. I checked out the submission script and it looks fine to me; so, did the node list.
The workflow already hangs in the submission of the first task, which is a single core, single thread task.
Stack
Client zip
client.session.zip
Session zip
sandbox.session.zip