Closed tgebo closed 3 years ago
Looking at similar log files, it looks like this is only happening on Slurm jobs, can you confirm if that's the case?
Yes its only happening on Slurm which leads me to believe the output dir is not writable for 'others'. However, prior to launching the jobs I changed the permissions for the output dir so that I should be able to.
drwxrwxrwx. 56 root root 16K May 18 16:59 output_call-gSV
Could it be because the sub-directories where outputs go into are not? If so, would I need to first launch the job on SGE, and once the sub-dirs are created, change those permissions?
drwxr-xr-x. 2 root root 16K May 18 17:39 delly-0.8.7
drwxr-xr-x. 2 root root 16K May 18 17:39 bcftools-1.12
drwxr-xr-x. 2 root root 16K May 18 17:40 vcftools-0.1.16
drwxr-xr-x. 2 root root 16K May 18 17:40 rtgtools-3.12
@tyamaguchi-ucla Have you seen anything like this before?
@tgebo it looks like you don't have write permission in the parent directory. Your user IDs on SGE and Slurm are different although they are both tgebo
.
SGE [tyamaguchi@ip-0A125211 tag]$ ls -lth total 16K drwxrwxr-x. 8 tgebo tgebo 16K May 17 02:14 zlotta
SGE [tyamaguchi@ip-0A125211 tag]$ id -u tgebo 20009
Slurm [tyamaguchi@ip-0A125212 tag]$ id -u tgebo 1724262646
Therefore, tgetbo (1724262646) is not allowed to write anything under /hot/projects/diseases/prostate-cancer/tag
You may want to consider using group permission from SGE and individual permission from Slurm to avoid this kind of permission issue. I will also ask OHIA once again to complete adding group permissions to Slurm Dev.
@tyamaguchi-ucla I'm somewhat thrown off by this because all of my outputs from upstream pipelines (alignDNA, callgSNP, regenotype) are all under the /data/projects/diseases/prostate-cancer/tag directory to which slurm has been able to write to.
It also looks like both 'users' (SGE and Slurm) were able to write logs in the same output dir: drwxrwxrwx. 3 tgebo domain users 16K May 17 04:39 20210517-043928 drwxrwxrwx. 3 root root 16K May 17 04:39 20210517-043935
Can you point me to the call-gSV method config?
/data/users/tgebo/pipelines/pipeline-call-gSV/pipeline/config/methods.config
FYI, I'm running job ID 38756 for TAG-011 (running about 3 hrs 15 min so far) to see if it works for my user account using the original config files.
Thanks, Tim.
@tgebo it looks like the method config was recently updated [tyamaguchi@ip-0A125211 zlotta]$ ls -lth /data/users/tgebo/pipelines/pipeline-call-gSV/pipeline/config/methods.config -rwxrwxrwx. 1 tgebo tgebo 3.4K May 17 02:01 /data/users/tgebo/pipelines/pipeline-call-gSV/pipeline/config/methods.config
Did you see the same issue with an old config or just this new config file?
Also, can you try this command here touch test.txt
on Slurm?
/hot/projects/diseases/prostate-cancer/tag/zlotta/output_call-gSNP
@tyamaguchi-ucla Yes, I updated that directory to make sure everything is from v2.2 before starting any runs.
[tgebo@ip-0A125212 pipeline]$ cd /hot/projects/diseases/prostate-cancer/tag/zlotta/output_call-gSNP
[tgebo@ip-0A125212 output_call-gSNP]$ touch test.txt
touch: cannot touch ‘test.txt’: Permission denied
Ok, so you didn't see the issue with v2.1 or below, right?
[tgebo@ip-0A125212 pipeline]$ cd /hot/projects/diseases/prostate-cancer/tag/zlotta/output_call-gSNP [tgebo@ip-0A125212 output_call-gSNP]$ touch test.txt touch: cannot touch ‘test.txt’: Permission denied
Yeah, this is the expected behavior. Can you point me to the method configs for aling-DNA, call-gSNP and regenotype-gSNP?
Actually I was only using SGE back when I ran this and it was on v2.1.
alignDNA: I ran this back when the docker version was still in use so no methods.config
call-gSNP:/data/users/tgebo/pipelines/pipeline-call-gSNP/methods.config
regenotype-gSNP: /data/users/tgebo/pipelines/pipeline-regenotype-gSNP/pipeline/config/single-node/methods.config
It looks like the methods.config configs were also recently updated. With the setting, you run pipelines as tgebo (1724262646) and inherit all the groups associated with the user id on Slurm. (also, docker might not be able to check permissions in parent directories but I'm not sure)
It's hard to debug as it looks like some of the logs were overwritten last night but the permissions are not consistent in the directories at all.
We really should implement group permissions but for now there are a few things we can do.
For example, 1) Use your AD account user and group IDs (Slurm uid gid) throughout the directories with 755 but I think you can still run pipelines on SGE as they always run as root on SGE.
2) Use group permission from SGE and individual permission from Slurm with 775
There are pros and cons to the options but you may want to think hard about the current permission settings (in terms of both non-docker and docker) and modify the permissions. Let me know if you want to have a sync on this.
@timothyjsanders please let me know if you find this is a call-gSV specific issue.
I'm allowing permission to the entire /data/projects/diseases/prostate-cancer/tag directory to try to avoid this kind of issue from now on. Thanks guys!
I'm allowing permission to the entire /data/projects/diseases/prostate-cancer/tag directory to try to avoid this kind of issue from now on. Thanks guys!
I wouldn't recommend 777 tho.
I did 775 and all my jobs today failed with the same reason again. 777 works though.
I did 775 and all my jobs today failed with the same reason again. 777 works though.
It looks like you didn't try 2) above? I can go over how you can implement it at 1:1.
@tgebo Are you still having this issue or is it ok to close?
Just following up on this. @tgebo Are you still having this issue or we'll close this issue if no response by Friday.
@tyamaguchi-ucla oops missed that last comment but yes, everything worked out. Okay to close.
Describe the issue Some runs are reporting that it failed to publish the output files from /scratch while saying the job has completed/succeeded. The log files for the delly tool look like all the stages were performed though.
WARN: Failed to publish file: /scratch/ab/411efff264a58b8984083e565a9ed6/DELLY-0.8.7_SV__TAG-003.bcf.sha512; to: /hot/projects/diseases/prostate-cancer/tag/zlotta/output_call-gSV/delly-0.8.7/DELLY-0.8.7_SV__TAG-003.bcf.sha512 [copy] -- See log file for details
/hot/resources/zlotta/scripts/call-gSV/TAG-*.config
/hot/users/tgebo/pipelines/pipeline-call-gSV/pipeline
/hot/users/tgebo/pipelines/pipeline-call-gSV/pipeline/TAG-*.log
/hot/projects/diseases/prostate-cancer/tag/zlotta/output_call-gSV/202105*/log/*/TAG-*.log.command.log
To Reproduce Steps to reproduce the behavior:
Expected behavior Job says it completed/succeeded and output files located in the output dir