maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
381 stars 85 forks source link

error in chip-seq using Genrich as mapper #945

Closed sunta3iouxos closed 7 months ago

sunta3iouxos commented 11 months ago

I stumbled upon this error:

Sample sheet found and header is ok!

---- This analysis has been done using snakePipes version 2.7.3 ----
Sample sheet found and header is ok!
Building DAG of jobs...
InputFunctionException in line 164 of /home/tgeorgom/mambaforge/envs/snakePipes/lib/python3.11/site-packages/snakePipes/shared/rules/ChIP_peak_calling_spikein.snakefile:
Error:
  TypeError: can only concatenate str (not "bool") to str
Wildcards:
  group=DMSO_POLII
Traceback:
  File "/home/tgeorgom/mambaforge/envs/snakePipes/lib/python3.11/site-packages/snakePipes/shared/rules/ChIP_peak_calling_spikein.snakefile", line 167, in <lambda>
  File "/home/tgeorgom/mambaforge/envs/snakePipes/lib/python3.11/site-packages/snakePipes/shared/rules/ChIP_peak_calling_spikein.snakefile", line 167, in <listcomp>

 Spikein genome detected - at least one spikeIn chromosome found with extention _spikein .

Error: snakemake returned an error code of 1, so processing is incomplete!

the command is:

ChIP-seq -d /mnt/c/AP04/ -j 8 --local --useSpikeInForNorm --getSizeFactorsFrom genome --peakCaller Genrich  --sampleSheet /mnt/c/AP04/PolII.tsv --w
indowSize 500 --plotFormat "pdf"  mm10_gencodeM19_spikesTEST /mnt/c/AP04/PolII_ChIPtype.yalm

the samplelist is:

name    condition
A006850324_209957_S1_L000       DMSO_POLII
A006850324_209960_S2_L000       DMSO_POLII
A006850324_209962_S3_L000       DMSO_POLII
A006850324_209964_S4_L000       AA5_POLII
A006850324_209966_S5_L000       AA5_POLII
A006850324_209968_S6_L000       AA5_POLII

and the yalm file for the peak type is:

chip_dict:
  A006850324_209957_S1_L000:
    broad: False
  A006850324_209960_S2_L000:
    broad: False
  A006850324_209962_S3_L000:
    broad: False
  A006850324_209964_S4_L000:
    broad: False
  A006850324_209966_S5_L000:
    broad: Fasle
  A006850324_209968_S6_L000:
    broad: False

the -d folder contains the following:

Bowtie2
ChIP-seq.cluster_config.yaml
ChIP-seq.config.yaml
ChIP-seq_organism.yaml
ChIP-seq_run-4.log
DNA-mapping.cluster_config.yaml
DNA-mapping.config.yaml
DNA-mapping_organism.yaml
DNA-mapping_run-4.log
DNA-mapping_tools.txt
FASTQ_fastp
FastQC
FastQC_trimmed
PolII.tsv
PolII_ChIPtype.yalm
PolII_ChIPtype_all.yalm
Sambamba
Ser5PolII.tsv
bamCoverage
chip_samples.yaml
chip_seq_sample_config.PREDICTED.yaml
cluster_logs
deepTools_qc
fastq
filter_rules
filtered_bam
multiQC
katsikora commented 11 months ago

Hi,

I see where this error from Genrich peak calling rule would arise if you don't have input samples as control. This is an unintended bug and can be fixed. Genrich should be able to call peaks even if input control is missing.

Best wishes,

Katarzyna

sunta3iouxos commented 11 months ago

Also , Genrich provides qvalues in the manner of the IDR approach by taking into account the biological replicates. Is it possible to also add this? the replicates can also be taken from the sample sheet.

katsikora commented 11 months ago

Also , Genrich provides qvalues in the manner of the IDR approach by taking into account the biological replicates. Is it possible to also add this? the replicates can also be taken from the sample sheet.

This we already have implemented.

sunta3iouxos commented 11 months ago

This we already have implemented.

So it did not work for me due to the lack of input. Could you please direct me to the relevant information so that I could properly use this option? Thank you

katsikora commented 11 months ago

I've pushed a fix to the develop branch. Do you want to try to install the development snakePipes version and try it out? It should work without input now, as expected for cutN'x experiments.

sunta3iouxos commented 11 months ago

I've pushed a fix to the develop branch. Do you want to try to install the development snakePipes version and try it out? It should work without input now, as expected for cutN'x experiments.

I would like to, unfortunately I have installed snakePipes via conda. Is there a way to update the pipeline to the development version using conda/mamba? If else, is there a way to manually do this?

Thank you!

katsikora commented 11 months ago
git checkout -b develop https://github.com/maxplanck-ie/snakepipes.git snakepipes_develop_folder
conda env create -n snakepipes_develop
mamba install -n snakepipes_develop pip
cd snakepipes_develop_folder
pip install --upgrade .
snakepipes config (with your standard options)

Then you should be good to go to conda activate snakepipes_develop and run the workflows.

Best,

Katarzyna

sunta3iouxos commented 11 months ago

got an issue:

git checkout -b develop https://github.com/maxplanck-ie/snakepipes.git snakepipes_develop_folder
fatal: not a git repository (or any of the parent directories): .git

I assume that I can download directly the folder via github

katsikora commented 11 months ago

ah sorry, it should perhaps read git clone instead of git checkout

sunta3iouxos commented 11 months ago

unfortunately I got an error:

  Building wheel for datrie (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for datrie (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [74 lines of output]

and then:

      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for datrie
  Building wheel for stopit (setup.py) ... done
  Created wheel for stopit: filename=stopit-1.1.2-py3-none-any.whl size=11939 sha256=6ef399146fe9c7647d6f6f00f4dde98429a83ec5bccc33ff98ba820909be2d6d
  Stored in directory: /home/tgeorgom/.cache/pip/wheels/10/10/63/c3c98c9859d2aa59553536cc2ea005d3c9c39e214ab4fd614c
Successfully built snakePipes connection-pool stopit
Failed to build datrie
ERROR: Could not build wheels for datrie, which is required to install pyproject.toml-based projects
sunta3iouxos commented 11 months ago

In addition, shouldn't I first activate the snakepipes_develop and then do pip install --upgrade . ? because where pip will do the update? I assume in the base environment

katsikora commented 11 months ago

Alright, thanks for reporting that one, I've seen it before. It's related to some dependency builds failing for python 3.12. I'll have to cap the python version until this is fixed.

For now, you can run conda env create -n snakepipes_develop python=3.11 or mamba install -n snakepipes_develop python=3.11 before running pip install

katsikora commented 11 months ago

In addition, shouldn't I first activate the snakepipes_develop and then do pip install --upgrade . ? because where pip will do the update? I assume in the base environment

That's right, it's better if you conda activate snakepipes_develop first.

sunta3iouxos commented 11 months ago
snakepipes config (with your standard options)

How am I transferring my settings to the new development environment? Or how am I linking the old settings to the newer one? Thank you!

sunta3iouxos commented 11 months ago

I've run into some errors. What I did.

git clone -b develop https://github.com/maxplanck-ie/snakepipes.git snakepipes_develop_folder
cd snakepipes_develop_folder/
mamba create -n snakePipes_devel python=3.11 pip
mamba activate snakePipes_devel
cd ../snakepipes_develop_folder
pip install --upgrade .

Then I copied some files from my stable environment:

cp -r /scratch/tgeorgom/mamba/snakePipes/lib/python3.11/site-packages/snakePipes/shared/*  /scratch/tgeorgom/mamba/snakePipes_devel/lib/python3.11/site-packages/snakePipes/shared/
cp /scratch/tgeorgom/mamba/snakePipes/lib/python3.11/site-packages/snakePipes/shared/organisms/* /scratch/tgeorgom/mamba/snakePipes_devel/lib/python3.11/site-packages/snakePipes/shared/organisms/
cp -r /scratch/tgeorgom/mamba/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/* /scratch/tgeorgom/mamba/snakePipes_devel/lib/python3.11/site-packages/snakePipes/workflows/

first config:

snakePipes config gives the correct parameters 

--- Final Updated Config ---------------------------------------------------------------------
config file: /scratch/tgeorgom/mamba/snakePipes_devel/lib/python3.11/site-packages/snakePipes/shared/defaults.yaml
clusterConfig: shared/cluster.yaml
condaEnvDir: None
configMode: manual
emailSender: None
max_thread: 4
oldConfig: None
onlySSL: False
organismsDir: shared/organisms
smtpPassword: None
smtpPort: 0
smtpServer: None
smtpUsername: None
snakemakeOptions:  --use-conda --conda-prefix /scratch/tgeorgom/mamba/snakePipes/envs
tempDir: /scratch/tgeorgom/temp/
toolsVersion: True
--------------------------------------------------------------------------------

The createEnvs returns an error:

snakePipes createEnvs
Traceback (most recent call last):
  File "/scratch/tgeorgom/mamba/snakePipes_devel/bin/snakePipes", line 441, in <module>
    main(sys.argv[1:])
  File "/scratch/tgeorgom/mamba/snakePipes_devel/bin/snakePipes", line 435, in main
    createCondaEnvs(args)
  File "/scratch/tgeorgom/mamba/snakePipes_devel/bin/snakePipes", line 322, in createCondaEnvs
    md5hash.update(condaDirUse.encode())
                   ^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'encode'
katsikora commented 11 months ago

Hi,

indeed the behaviour of snakePipes config and snakePipes createEnvs has changed such that snakePipes config now accepts the argument --condaEnvDir, in which snakePipes createEnvs is going to create the environments. Also snakemakeOptions --use-conda --conda-prefix are removed from the config and hard-coded in the snakemake command.

Briefly, running snakePipes config --condaEnvDir /scratch/tgeorgom/mamba/snakePipes/envs and then snakePipes createEnvs should be sufficient to be able to run the workflows.

Best wishes,

Katarzyna

NixBio commented 11 months ago

Thank you for your message. I am out of office. I will answer your email once I am back in my office.

In urgent cases, please, contact genomics-core(at)rcii.de

Kind Regards, Nicholas Strieder

-- Dr. rer. nat. Nicholas Strieder ~~

Leibniz-Institut für Immuntherapie - LIT NGS Core - Bininformatics Universitätsklinikum Regensburg Franz-Josef-Strauß-Allee 11 93053 Regensburg Germany

Phone: ++49 (0)941 944 18188 E-mail: @.***

Katarzyna Sikora @.***> 2.11.23 13:07 >>>

Hi,

indeed the behaviour of snakePipes config and snakePipes createEnvs has changed such that snakePipes config now accepts the argument --condaEnvDir, in which it is going to create the environments. snakePipes createEnvs would use this path then. Also snakemakeOptions --use-conda --conda-prefix are removed from the config and hard-coded in the snakemake command.

Briefly, running snakePipes config --condaEnvDir /scratch/tgeorgom/mamba/snakePipes/envs and then snakePipes createEnvs should be sufficient to be able to run the workflows.

Best wishes,

Katarzyna

-- Reply to this email directly or view it on GitHub: https://github.com/maxplanck-ie/snakepipes/issues/945#issuecomment-1790608441 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

sunta3iouxos commented 11 months ago

The initiation worked but when I started the analysis I got the following error:


---- This analysis has been done using snakePipes version 2.7.3 ----
Sample sheet found and header is ok!
Building DAG of jobs...
InputFunctionException in line 164 of /scratch/tgeorgom/mamba/snakePipes_devel/lib/python3.11/site-packages/snakePipes/shared/rules/ChIP_peak_calling_spikein.snakefile:
Error:
  TypeError: can only concatenate str (not "bool") to str
Wildcards:
  group=DMSO_POLII
Traceback:
  File "/scratch/tgeorgom/mamba/snakePipes_devel/lib/python3.11/site-packages/snakePipes/shared/rules/ChIP_peak_calling_spikein.snakefile", line 167, in <lambda>
  File "/scratch/tgeorgom/mamba/snakePipes_devel/lib/python3.11/site-packages/snakePipes/shared/rules/ChIP_peak_calling_spikein.snakefile", line 167, in <listcomp>

 Spikein genome detected - at least one spikeIn chromosome found with extention _spikein .

The command I used is:

ChIP-seq -d /scratch/tgeorgom/AP04/ --useSpikeInForNorm --getSizeFactorsFrom genome --peakCaller Genrich --peakCallerOptions "-y -a 1 -e chrM,chrY -q 0.01" --sampleSheet /scratch/tgeorgom/AP04/PolII.tsv --windowSize 500 --plotFormat "pdf"  mm10_gencodeM19_spikesTEST /scratch/tgeorgom/AP04/PolII_ChIPtype_all.yalm
katsikora commented 11 months ago

Hmm, this looks like the error you originally reported.

katsikora commented 11 months ago
cp -r /scratch/tgeorgom/mamba/snakePipes/lib/python3.11/site-packages/snakePipes/shared/*  /scratch/tgeorgom/mamba/snakePipes_devel/lib/python3.11/site-packages/snakePipes/shared/

I think this might have overwritten the changes I made to the Genrich rule in the develop branch.

katsikora commented 7 months ago

The fix is now part of snakePipes 2.8.0, 2.8.1.