snakemake-workflows / rna-seq-star-deseq2

RNA-seq workflow using STAR and DESeq2
MIT License
327 stars 203 forks source link

Failed to run with cutadapt #72

Closed cchapus closed 7 months ago

cchapus commented 8 months ago

Hello,

when I'm trying to trim my fastq, after setting "True" to the config file, I'm getting an error.

Traceback (most recent call last):  File "/XXXXX/workflow/.snakemake/scripts/tmp1j2yk7de.wrapper.py", line 27, in <module>
    shell(
  File "/XXXXX/mambaforge/envs/snakemake_7.25.0/lib/python3.11/site-packages/snakemake/shell.py", line 300, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  cutadapt --cores 8 AGATCGGAAGAGCACACGTCTGAACTCCAGTCA  -o results/trimmed/HIPPVG_CD4V_lane1_R1.fastq.gz -p results/trimmed/HIPPVG_CD4V_lane1_R2.fastq.gz pipe/cutadapt/HIPPVG_CD4V/lane1.fq1.fastq.gz pipe/cutadapt/HIPPVG_CD4V/lane1.fq2.fastq.gz > results/trimmed/HIPPVG_CD4V_lane1.paired.qc.txt  2> logs/cutadapt/HIPPVG_CD4V_lane1.log' returned non-zero exit status 2.

I'm trying to trace back the error and I've look at the job lists.

[Fri Feb  9 20:09:00 2024]
    rule cutadapt_pe:
        input: pipe/cutadapt/HIPPVG_CD4V/lane1.fq1.fastq.gz, pipe/cutadapt/HIPPVG_CD4V/lane1.fq2.fastq.gz
        output: results/trimmed/HIPPVG_CD4V_lane1_R1.fastq.gz, results/trimmed/HIPPVG_CD4V_lane1_R2.fastq.gz, results/trimmed/HIPPVG_CD4V_lane1.paired.qc.txt
        log: logs/cutadapt/HIPPVG_CD4V_lane1.log
        jobid: 501
        reason: Missing output files: results/trimmed/HIPPVG_CD4V_lane1_R2.fastq.gz, results/trimmed/HIPPVG_CD4V_lane1_R1.fastq.gz; Input files updated by another job: pipe/cutadapt/HIPPVG_CD4V/lane1.fq1.fastq.gz, pipe/cutadapt/HIPPVG_CD4V/lane1.fq2.fastq.gz
        wildcards: sample=HIPPVG_CD4V, unit=lane1
        threads: 8
        resources: tmpdir=/tmp

    [Fri Feb  9 20:09:00 2024]
    rule cutadapt_pipe:
        input: XXXXXXXX/HIPPVG_CD4V.cleaned.R1.fastq.gz
        output: pipe/cutadapt/HIPPVG_CD4V/lane1.fq1.fastq.gz (pipe)
        log: logs/pipe-fastqs/catadapt/HIPPVG_CD4V_lane1.fq1.fastq.gz.log
        jobid: 502
        reason: Missing output files: pipe/cutadapt/HIPPVG_CD4V/lane1.fq1.fastq.gz
        wildcards: sample=HIPPVG_CD4V, unit=lane1, fq=fq1, ext=fastq.gz
        threads: 0
        resources: tmpdir=/tmp

    [Fri Feb  9 20:09:00 2024]
    rule cutadapt_pipe:
        input: XXXXXXXX/HIPPVG_CD4V.cleaned.R2.fastq.gz
        output: pipe/cutadapt/HIPPVG_CD4V/lane1.fq2.fastq.gz (pipe)
        log: logs/pipe-fastqs/catadapt/HIPPVG_CD4V_lane1.fq2.fastq.gz.log
        jobid: 503
        reason: Missing output files: pipe/cutadapt/HIPPVG_CD4V/lane1.fq2.fastq.gz
        wildcards: sample=HIPPVG_CD4V, unit=lane1, fq=fq2, ext=fastq.gz
        threads: 0
        resources: tmpdir=/tmp

It seems the error can be trace back to the rule cutadapt_pipe. The rule should copy the fq1 and fq2 to a temporary folder pipe/cutadapt/{sample}. But the copies failed.

The shell command is cat {input} > {output} 2> {log}, but if output needs to be in a non-existant folder (pipe/cutadapt/{sample}) it can not work.

I couldn't found a line in the workflow creating these folders. I tried to change the shell command to "mkdir -p pipe/cutadapt/{wildcards.sample} && cat {input} > {output} 2> {log}", but it's not working and my snakemake knowledge is not good enough.

My fix is to run manually beforehand cutadapt on my 256 files and use the workflow without trimming, but I would rather have a working solution.

Regards

dlaehnemann commented 7 months ago

Sorry for the slow response. From looking at the cutadapt command, it seems like you only provided the adapter sequence, but not the cutadapt command line argument specifying what kind of adapter this is. You specify:

cutadapt --cores 8 AGATCGGAAGAGCACACGTCTGAACTCCAGTCA  [...]

Instead this should be something like:

cutadapt --cores 8 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA  [...]

So the adapters column in your units.tsv needs to contain the -a or whatever the correct cutadapt command line argument is for the type of adapter you have. Also see the link in the adapter trimming configuration instructions of the snakemake workflow catalog help step 3: https://snakemake.github.io/snakemake-workflow-catalog/?usage=snakemake-workflows%2Frna-seq-star-deseq2

If it is something different or this somehow doesn't resolve your issue, feel free to reopen or create a new issue.