uclahs-cds / pipeline-call-gSV

Nextflow pipeline to call germline structural variants and copy number variants using DELLY and Manta
https://uclahs-cds.github.io/pipeline-call-gSV/
GNU General Public License v2.0
2 stars 1 forks source link

Failed to publish output file from /scratch #25

Closed tgebo closed 3 years ago

tgebo commented 3 years ago

Describe the issue Some runs are reporting that it failed to publish the output files from /scratch while saying the job has completed/succeeded. The log files for the delly tool look like all the stages were performed though.

WARN: Failed to publish file: /scratch/ab/411efff264a58b8984083e565a9ed6/DELLY-0.8.7_SV__TAG-003.bcf.sha512; to: /hot/projects/diseases/prostate-cancer/tag/zlotta/output_call-gSV/delly-0.8.7/DELLY-0.8.7_SV__TAG-003.bcf.sha512 [copy] -- See log file for details

Completed at: 18-May-2021 07:20:32
Duration    : 7h 10m 48s
CPU hours   : 386.4
Succeeded   : 12

To Reproduce Steps to reproduce the behavior:

  1. Go to working dir
  2. Submit call-gSV.nf using submission script and specifying sample config
  3. See error in log, no output in output dir

Expected behavior Job says it completed/succeeded and output files located in the output dir

timothyjsanders commented 3 years ago

Looking at similar log files, it looks like this is only happening on Slurm jobs, can you confirm if that's the case?

tgebo commented 3 years ago

Yes its only happening on Slurm which leads me to believe the output dir is not writable for 'others'. However, prior to launching the jobs I changed the permissions for the output dir so that I should be able to. drwxrwxrwx. 56 root root 16K May 18 16:59 output_call-gSV

Could it be because the sub-directories where outputs go into are not? If so, would I need to first launch the job on SGE, and once the sub-dirs are created, change those permissions?

drwxr-xr-x. 2 root  root         16K May 18 17:39 delly-0.8.7
drwxr-xr-x. 2 root  root         16K May 18 17:39 bcftools-1.12
drwxr-xr-x. 2 root  root         16K May 18 17:40 vcftools-0.1.16
drwxr-xr-x. 2 root  root         16K May 18 17:40 rtgtools-3.12
timothyjsanders commented 3 years ago

@tyamaguchi-ucla Have you seen anything like this before?

tyamaguchi-ucla commented 3 years ago

@tgebo it looks like you don't have write permission in the parent directory. Your user IDs on SGE and Slurm are different although they are both tgebo.

SGE [tyamaguchi@ip-0A125211 tag]$ ls -lth total 16K drwxrwxr-x. 8 tgebo tgebo 16K May 17 02:14 zlotta

SGE [tyamaguchi@ip-0A125211 tag]$ id -u tgebo 20009

Slurm [tyamaguchi@ip-0A125212 tag]$ id -u tgebo 1724262646

Therefore, tgetbo (1724262646) is not allowed to write anything under /hot/projects/diseases/prostate-cancer/tag

You may want to consider using group permission from SGE and individual permission from Slurm to avoid this kind of permission issue. I will also ask OHIA once again to complete adding group permissions to Slurm Dev.

tgebo commented 3 years ago

@tyamaguchi-ucla I'm somewhat thrown off by this because all of my outputs from upstream pipelines (alignDNA, callgSNP, regenotype) are all under the /data/projects/diseases/prostate-cancer/tag directory to which slurm has been able to write to.

It also looks like both 'users' (SGE and Slurm) were able to write logs in the same output dir: drwxrwxrwx. 3 tgebo domain users 16K May 17 04:39 20210517-043928 drwxrwxrwx. 3 root root 16K May 17 04:39 20210517-043935

tyamaguchi-ucla commented 3 years ago

Can you point me to the call-gSV method config?

tgebo commented 3 years ago

/data/users/tgebo/pipelines/pipeline-call-gSV/pipeline/config/methods.config

timothyjsanders commented 3 years ago

FYI, I'm running job ID 38756 for TAG-011 (running about 3 hrs 15 min so far) to see if it works for my user account using the original config files.

tyamaguchi-ucla commented 3 years ago

Thanks, Tim.

@tgebo it looks like the method config was recently updated [tyamaguchi@ip-0A125211 zlotta]$ ls -lth /data/users/tgebo/pipelines/pipeline-call-gSV/pipeline/config/methods.config -rwxrwxrwx. 1 tgebo tgebo 3.4K May 17 02:01 /data/users/tgebo/pipelines/pipeline-call-gSV/pipeline/config/methods.config

Did you see the same issue with an old config or just this new config file?

Also, can you try this command here touch test.txt on Slurm? /hot/projects/diseases/prostate-cancer/tag/zlotta/output_call-gSNP

tgebo commented 3 years ago

@tyamaguchi-ucla Yes, I updated that directory to make sure everything is from v2.2 before starting any runs.

[tgebo@ip-0A125212 pipeline]$ cd /hot/projects/diseases/prostate-cancer/tag/zlotta/output_call-gSNP
[tgebo@ip-0A125212 output_call-gSNP]$ touch test.txt
touch: cannot touch ‘test.txt’: Permission denied
tyamaguchi-ucla commented 3 years ago

Ok, so you didn't see the issue with v2.1 or below, right?

[tgebo@ip-0A125212 pipeline]$ cd /hot/projects/diseases/prostate-cancer/tag/zlotta/output_call-gSNP [tgebo@ip-0A125212 output_call-gSNP]$ touch test.txt touch: cannot touch ‘test.txt’: Permission denied

Yeah, this is the expected behavior. Can you point me to the method configs for aling-DNA, call-gSNP and regenotype-gSNP?

tgebo commented 3 years ago

Actually I was only using SGE back when I ran this and it was on v2.1.

alignDNA: I ran this back when the docker version was still in use so no methods.config

call-gSNP:/data/users/tgebo/pipelines/pipeline-call-gSNP/methods.config

regenotype-gSNP: /data/users/tgebo/pipelines/pipeline-regenotype-gSNP/pipeline/config/single-node/methods.config

tyamaguchi-ucla commented 3 years ago

It looks like the methods.config configs were also recently updated. With the setting, you run pipelines as tgebo (1724262646) and inherit all the groups associated with the user id on Slurm. (also, docker might not be able to check permissions in parent directories but I'm not sure)

It's hard to debug as it looks like some of the logs were overwritten last night but the permissions are not consistent in the directories at all.

We really should implement group permissions but for now there are a few things we can do.

For example, 1) Use your AD account user and group IDs (Slurm uid gid) throughout the directories with 755 but I think you can still run pipelines on SGE as they always run as root on SGE.

2) Use group permission from SGE and individual permission from Slurm with 775

There are pros and cons to the options but you may want to think hard about the current permission settings (in terms of both non-docker and docker) and modify the permissions. Let me know if you want to have a sync on this.

@timothyjsanders please let me know if you find this is a call-gSV specific issue.

tgebo commented 3 years ago

I'm allowing permission to the entire /data/projects/diseases/prostate-cancer/tag directory to try to avoid this kind of issue from now on. Thanks guys!

tyamaguchi-ucla commented 3 years ago

I'm allowing permission to the entire /data/projects/diseases/prostate-cancer/tag directory to try to avoid this kind of issue from now on. Thanks guys!

I wouldn't recommend 777 tho.

tgebo commented 3 years ago

I did 775 and all my jobs today failed with the same reason again. 777 works though.

tyamaguchi-ucla commented 3 years ago

I did 775 and all my jobs today failed with the same reason again. 777 works though.

It looks like you didn't try 2) above? I can go over how you can implement it at 1:1.

tyamaguchi-ucla commented 3 years ago

@tgebo Are you still having this issue or is it ok to close?

tyamaguchi-ucla commented 3 years ago

Just following up on this. @tgebo Are you still having this issue or we'll close this issue if no response by Friday.

tgebo commented 3 years ago

@tyamaguchi-ucla oops missed that last comment but yes, everything worked out. Okay to close.