maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
MIT License
374 stars 85 forks source link

Sample file in WGBS does not produces the appropriate groups #593

Closed sunta3iouxos closed 4 years ago

sunta3iouxos commented 4 years ago

It appears as if names and conditions are all mixed up: Continuing from issue 589 Sample sheet is :

name    condition
109242  109242  wt_B73
109243  109243  wt_B73
109244  109244  wt_B73
109245  109245  mut_pht1_6
109246  109246  mut_pht1_6
109247  109247  mut_pht1_6

while running the metillene I found the following that is troublesome:

the following ids belong to group A (n=1): 0: column: 0, name:109242_wt_B73 the following ids belong to group B (n=1): 0: column: 1, name:109243_wt_B73

Group A should have been wt_B73 and GroupB mut_pht1_6

and same for dmrseq: params = list(NULL, 'dmrseq_BU16_minCoverage5', '/home/hthiele0/snakemake/BU16.txt', c('109242', '109243'), 300, 10, 0.1, 5, 0.1, "blacklist" = NULL, "odir" = 'dmrseq_BU16_minCoverage5', "sampleSheet" = '/home/hthiele0/snakemake/BU16.txt', "groups" = c('109242', '109243'), "maxDist" = 300, "minCpGs" = 10, "minMethDiff" = 0.1, "minCoverage" = 5, "FDR" = 0.1),

So I think that the outputs are not of the correct comparisons

just to add for metilene: from the header of

metilene_BU16_minCoverage5/metilene.IN.txt

chr pos 109242_wt_B73 109243_wt_B73 109244_wt_B73 109245_mut_pht1_6 109246_mut_pht1_6 109247_mut_pht1_6

and then: from header of

metilene_BU16_minCoverage5/DMRs.txt:

chrom start end q-value mean methylation difference nCpGs p (MWU) p (2D KS) mean_109242 mean_109243

katsikora commented 4 years ago

I can confirm that this is still an issue with 2.0.0, even with using "Treatment" and "Control" as condition levels.

katsikora commented 4 years ago

There's a chance that we will have a bugfix release still today.

sunta3iouxos commented 4 years ago

(you where fast I was editing the previous post) Any idea when to expect a fix? (that has been answered) Just a remminder, that dmrseq is also affected. About the update, do I just do a conda update snakePipes and will this affect the resuming (never test it)? and a final note is that since metilene has finished successfully, what do I need to delete to get those processes that I need to rerun are: metileneReport prepForMetilene run_metilene

Best

sunta3iouxos commented 4 years ago

Good morning, Any news?

Best

katsikora commented 4 years ago

Hi,

I have added the fixes to develop and tagged a 2.0.2 release. There is an issue with conda build at the moment, so it's not yet available via conda install.

sunta3iouxos commented 4 years ago

No worries, if you do not mind could you please inform me on: A. how to update the snakePipes B. what to remove from the finished processes so that the following processes will resume: metileneReport prepForMetilene run_metilene

katsikora commented 4 years ago

The release is now available on conda. I've updated the information on updating snakePipes with conda on https://snakepipes.readthedocs.io/en/stable/ . image

To rerun specific rules, remove or rename the corresponding folders. In this case the metilene* folder.

sunta3iouxos commented 4 years ago

Got the following error: after deleting the metilene and dmrseq folder:

Sample sheet found and header is ok!
Traceback (most recent call last):
  File "/home/hthiele0/snakemake/miniconda3/envs/snakePipes/lib/python3.6/site-packages/snakemake/__init__.py", line 647, in snakemake
    keepincomplete=keep_incomplete,
  File "/home/hthiele0/snakemake/miniconda3/envs/snakePipes/lib/python3.6/site-packages/snakemake/workflow.py", line 587, in execute
    or delete_temp_output,
  File "/home/hthiele0/snakemake/miniconda3/envs/snakePipes/lib/python3.6/site-packages/snakemake/persistence.py", line 68, in __init__
    os.makedirs(d, exist_ok=True)
  File "/home/hthiele0/snakemake/miniconda3/envs/snakePipes/lib/python3.6/os.py", line 210, in makedirs
    makedirs(head, mode, exist_ok)
  File "/home/hthiele0/snakemake/miniconda3/envs/snakePipes/lib/python3.6/os.py", line 210, in makedirs
    makedirs(head, mode, exist_ok)
  File "/home/hthiele0/snakemake/miniconda3/envs/snakePipes/lib/python3.6/os.py", line 220, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/package'
Error: snakemake returned an error code of 1, so processing is incomplete!
katsikora commented 4 years ago

You might have to run snakePipes config first and update any paths as relevant to your system. snakePipes info config [-h] [--snakemakeOptions SNAKEMAKEOPTIONS] [--organismsDir ORGANISMSDIR] [--clusterConfig CLUSTERCONFIG] [--tempDir TEMPDIR] [--noToolsVersion] [--smtpServer SMTPSERVER] [--smtpPort SMTPPORT] [--onlySSL] [--emailSender EMAILSENDER] [--smtpUsername SMTPUSERNAME] [--smtpPassword SMTPPASSWORD]

optional arguments: -h, --help show this help message and exit --snakemakeOptions SNAKEMAKEOPTIONS Update the options given to snakeMake. You MUST include --use-conda and an appropriate --conda-prefix if you change this! (Default: --use-conda --conda- prefix /package/anaconda3/envs ) --organismsDir ORGANISMSDIR The directory where global organism YAML files are to be stored. Both absolute and relative paths are supported. In the latter case the path is then relative to the snakePipes installation directory. (Default: shared/organisms) --clusterConfig CLUSTERCONFIG The YAML file containing the snakeMake cluster command and global memory settings. Both absolute and relative paths are supported. In the latter case the path is then relative to the snakePipes installation directory. (Default: shared/cluster.yaml) --tempDir TEMPDIR A custom directory where temporary files should be written. This is ideally locally attached to your cluster nodes. (Default: /data/extended/) --noToolsVersion By default, tool versions are printed to a workflow- specific file. Specifying this disables that behavior.

Email/SMTP options: These options are only used if/when --emailAddress is used.

--smtpServer SMTPSERVER SMTP server address. (Default: None) --smtpPort SMTPPORT The port on the SMTP server to use. A value of 0 will use the default SMTP port. (Default: 0) --onlySSL If specified, only use SSL-enabled connections. --emailSender EMAILSENDER The email address used to send emails. (Default: None) --smtpUsername SMTPUSERNAME For SMTP servers requiring a login, the username to use. (Default: None) --smtpPassword SMTPPASSWORD For SMTP servers requiring a login, the password to use. Note that this is stored in clear text! (Default: None)

katsikora commented 4 years ago

I think this is currently missing from the documentation, we'll add it. Details are under https://snakepipes.readthedocs.io/en/latest/content/setting_up.html , but I will add a short mention on the landing page, too.

katsikora commented 4 years ago

Btw, you can also use the --fromBAM option of the workflow to make sure the alignment step will not be re-executed. Pass your folder with (filtered) bam files to this argument, and specify a new output folder (optionally). The methylation extraction step will be repeated if you go for a new output folder.

--fromBAM If specified, the input is taking from BAM files containing alignments rather than fastq files. See also --bamExt. --bamExt BAMEXT If --fromBAM is specified, this is the expected file extension. Removing it yields sample names. Default: '.bam'

sunta3iouxos commented 4 years ago

thank you, will try and will let you know So after an update I need to reinitiate the config command?

sunta3iouxos commented 4 years ago

Since the server is on maintenance, I can only check if the correct commands are set.

A. in order for the snakePipe to work again I had to: snakePipes createEnvs

for somereason snakePipes config gives no output

B. after that the resuming process failed.

C I noticed that when using the fastq files 100 processes were set, using the --fromBAM I got 129 processes. The pipeline does not recognize the sorted Vs unsorted bam files:

/scratch/fastq/BU16/analyzed2/bwameth/109242.markdup.bam
/scratch/fastq/BU16/analyzed2/bwameth/109242.markdup.sorted.bam 

command: WGBS -i /scratch/fastq/BU16/analyzed2/bwameth/ -o /scratch/fastq/BU16/analyzed2/ -c snakemake/miniconda3/envs/snakePipes/lib/python3.6/site-packages/snakePipes/workflows/WGBS/defaults.yaml --clusterConfigFile /scratch/ccg-ngs/analyzed2/WGBS.cluster_config.original.yaml --keepTemp --DAG --trim --plotFormat pdf --fromBAM --sampleSheet snakemake/BU16.txt ZeamaysB73RefGenv4

where/scratch/fastq/BU16/analyzed2/bwameth/ is the path of the bam files

---- This analysis has been done using snakePipes version 2.0.2 ----
Sample sheet found and header is ok!
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 12
Job counts:
        count   jobs
        1       DepthOfCov
        1       DepthOfCovGenome
        6       FASTQ1
        6       FASTQ2
        1       all
        6       bedGraphToBigWig
        6       bwameth
        6       calcCHHbias
        6       calc_Mbias
        6       conversionRate
        1       dmrseq
        6       fastp
        6       get_flagstat
        6       indexMarkDupes
        6       index_bam
        6       markDupes
        6       methyl_extract
        1       metileneReport
        1       multiQC
        6       origFASTQ1
        6       origFASTQ2
        1       prepForMetilene
        1       produceReport
        1       run_metilene
        99
---- This analysis has been done using snakePipes version 2.0.2 ----
Sample sheet found and header is ok!
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 12
Job counts:
        count   jobs
        1       DepthOfCov
        1       DepthOfCovGenome
        1       all
        12      bedGraphToBigWig
        12      calcCHHbias
        12      calc_Mbias
        12      conversionRate
        1       dmrseq
        12      get_flagstat
        12      indexMarkDupes
        12      index_bam
        12      link_bam
        12      markDupes
        12      methyl_extract
        1       metileneReport
        1       multiQC
        1       prepForMetilene
        1       produceReport
        1       run_metilene
        129
katsikora commented 4 years ago

I see, so any file with the extention specified by --bamExt (.bam by default) will be used. So it looks like 12 bam files were found in total, I guess sorted and unsorted per sample. If you pass --bamExt '.sorted.bam', this should only pick up only the sorted bam files. Otherwise, you might need e.g. to link the set of bam files you want to analyze to some other folder and pass that to --fromBAM.

sunta3iouxos commented 4 years ago

There are some other things that could be avoided:

        1       DepthOfCov
        1       DepthOfCovGenome
        1       all
        12      bedGraphToBigWig
        12      calcCHHbias
        12      calc_Mbias
        12      conversionRate
        1       dmrseq
        12      get_flagstat
        12      indexMarkDupes
        12      index_bam
        12      link_bam
        12      markDupes

All those have been correctly calculated since it is per sample, but still the pipeline failed to find that those analysis have already been finished.

sunta3iouxos commented 4 years ago

tried to move the setup to another server and got the following error:

(/scratch2/Theo/snakePipes) [hthiele0@cheops0 ~]$ WGBS -i /scratch2/hthiele0/fastq \
/BU16/analyzed2/bwameth/ --fromBAM --bamExt sorted.bam \
-o /scratch2/hthiele0/fastq/BU16/analyzed2/ \
-c /scratch2/Theo/snakePipes/lib/python3.6/site-packages/snakePipes/workflows/WGBS/defaults.yaml \
--clusterConfigFile /scratch2/hthiele0/fastq/BU16/analyzed2/WGBS.cluster_config.original.yaml --keepTemp --DAG --trim --plotFormat pdf \
--sampleSheet /scratch2/Theo/snakemake/BU16.bak2 ZeamaysB73RefGenv4

and the error:

Sample sheet found and header is ok!

---- This analysis has been done using snakePipes version 2.0.2 ----
Sample sheet found and header is ok!
Building DAG of jobs...
MissingInputException in line 69 of /scratch2/Theo/snakePipes/lib/python3.6/site-packages/snakePipes/shared/rules/WGBS.snakefile:
Missing input files for rule index_bam:
bwameth/109242..sorted.bam

looking at the /scratch2/Theo/snakePipes/lib/python3.6/site-packages/snakePipes/shared/rules/WGBS.snakefile

rule index_bam:
    input:
        "bwameth/{sample}.sorted.bam"
    output:
        temp("bwameth/{sample}.sorted.bam.bai")
    log:
        err="bwameth/logs/{sample}.index_bam.err",
        out="bwameth/logs/{sample}.index_bam.out"
    conda: CONDA_SHARED_ENV
    shell: """
        samtools index "{input}" >{log.out} 2>{log.err}
        """

Any idea about this one?

katsikora commented 4 years ago

Hi Theo,

I think your extention is missing a dot '.': --bamExt DOTsortedDOTbam,

Best,

Katarzyna

katsikora commented 4 years ago

I think the issues in this thread are now handled, I'm closing the issue. Feel free to re-open if needed.