sdsc-hpc-training-org / reverse-proxy

1 stars 5 forks source link

Issue with job reservation #14

Closed arnodelorme closed 3 years ago

arnodelorme commented 3 years ago

This is using the head at #b17f03f from Feb 17th, 2021

ssh expanse.login.sdsc.edu git clone https://github.com/sdsc-hpc-training-org/reverse-proxy module load anaconda3 cd reverse-proxy ./start-jupyter -p debug -d $HOME -A csd403 -t 30 -s notebook

Issue 1: it selected compute even though I specified debug Issue 2: the job was not visible in the queue Issue 3: it was never reserved

[arno@login02 reverse-proxy]$ ./start-jupyter -p debug -d $HOME -A csd403 -t 30 -s notebook Your notebook is here: https://swoop-crested-easiness.expanse-user-content.sdsc.edu?token=0e3083190598d60faeb3bbd150f083f5 If you encounter any issues, please email help@xsede.org and mention the Reverse Proxy Service. Your job id is 1336982 You may occasionally run the command 'squeue -j 1336982' to check the status of your job [arno@login02 reverse-proxy]$ [arno@login02 reverse-proxy]$ squeue | grep arno [arno@login02 reverse-proxy]$ squeue | grep 1336982

JamesMcDougallJr commented 3 years ago

Interesting. Here's what I get when I run the same command:

(base) [jamesmcd@login02 reverse-proxy]$ ./start-jupyter -p debug -d $HOME -A ddp363 -t 30 -s notebook
Your notebook is here:
    https://animating-cried-skyward.expanse-user-content.sdsc.edu?token=3271e184fad1916f37cff1538042870a
If you encounter any issues, please email help@xsede.org and mention the Reverse Proxy Service.
Your job id is 1368573
You may occasionally run the command 'squeue -j 1368573' to check the status of your job
(base) [jamesmcd@login02 reverse-proxy]$ sq
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
           1368573     debug notebook jamesmcd  R       0:02      1 exp-9-55 
(base) [jamesmcd@login02 reverse-proxy]$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

Can you provide the output of squeue -u $USER ?

mkandes commented 3 years ago

This specific job failed immediately.

[mkandes@login01 ~]$ sacct -j 1336982 --format=User,JobID,Jobname%60,partition,state,time,submit,start,end,elapsed,reqMem,allocgres,MaxVMSize,MaxRSS,nnodes,ncpus,reqGRES,nodelist%240qlssq
     User        JobID                                                      JobName  Partition      State  Timelimit              Submit               Start                 End    Elapsed     ReqMem    AllocGRES  MaxVMSize     MaxRSS   NNodes      NCPUS      ReqGRES                                                                                                                                                                                                                                         NodeList 
--------- ------------        ----------------------------------------------------- ---------- ---------- ---------- ------------------- ------------------- ------------------- ---------- ---------- ------------ ---------- ---------- -------- ---------- ------------                                                                                                                                                                                            ----------------------------------------------------- 
     arno 1336982                                                       notebook.sh      debug     FAILED   00:30:00 2021-02-19T13:55:58 2021-02-19T13:55:58 2021-02-19T13:56:00   00:00:02        1Gc                                           1        128                                                                                                                                                                                                                                                      exp-9-55 
          1336982.bat+                                                        batch                FAILED            2021-02-19T13:55:58 2021-02-19T13:55:58 2021-02-19T13:56:00   00:00:02        1Gc                 149688K      4440K        1        128                                                                                                                                                                                                                                                      exp-9-55 
[mkandes@login01 ~]$
mkandes commented 3 years ago

@arnodelorme - Did you happen to receive a standard output and/or error file from Slurm here? If so, can you let me know where I can find it on Expanse?

arnodelorme commented 3 years ago

I have tried again using the same commands and it works now. Maybe something was fixed?