harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
69 stars 32 forks source link

Unintended whitespace in fastq paths. #141

Closed ChabbyTMD closed 10 months ago

ChabbyTMD commented 10 months ago

Hi Team,

I'm using SNPArcher on a resequencing dataset from NCBI SRA. I downloaded the SRA run table for 2 samples for testing purposes and made the necessary modifications as per the documentation. Upon execution of the pipeline I obtain the following error where whitespace is being inserted into the file paths of the read files.

Building DAG of jobs... File path ' results/data/fastq/ GCF_900626175.2 / SAMN19471810 / SRR14708202 _1.fastq.gz ' starts with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. File path ' results/data/fastq/ GCF_900626175.2 / SAMN19471810 / SRR14708202 _1.fastq.gz ' ends with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. File path ' results/data/fastq/ GCF_900626175.2 / SAMN19471810 / SRR14708202 _2.fastq.gz ' starts with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. File path ' results/data/fastq/ GCF_900626175.2 / SAMN19471810 / SRR14708202 _2.fastq.gz ' ends with whitespace. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake. MissingInputException in rule fastp in file /home/trevor/Desktop/snpArcher/workflow/rules/fastq.smk, line 34: Missing input files for rule fastp: output: results/GCF_900626175.2/filtered_fastqs/SAMN19471810/SRR14708202_1.fastq.gz, results/GCF_900626175.2/filtered_fastqs/SAMN19471810/SRR14708202_2.fastq.gz, results/GCF_900626175.2/summary_stats/SAMN19471810/SRR14708202.fastp.out wildcards: refGenome=GCF_900626175.2, sample=SAMN19471810, run=SRR14708202 affected files: results/data/fastq/ GCF_900626175.2 / SAMN19471810 / SRR14708202 _1.fastq.gz results/data/fastq/ GCF_900626175.2 / SAMN19471810 / SRR14708202 _2.fastq.gz

I'm not sure why this is happening.

cademirch commented 10 months ago

Hi @ChabbyTMD, this seems to be a known issue with Python 3.12 and Snakemake. There is an issue open in snakemake's repo discussing this. The solution here is to downgrade to Python 3.11, see here: https://github.com/snakemake/snakemake/issues/2480#issuecomment-1765902814

ChabbyTMD commented 10 months ago

Hi @cademirch, I'll try this and get back to you. Thank you.

cademirch commented 10 months ago

Hi @cademirch, I'll try this and get back to you. Thank you.

@ChabbyTMD, did this work for you?

ChabbyTMD commented 10 months ago

Hi @cademirch yes it did. I had to re-build the snparcher environment with the following then it worked; conda create -c conda-forge -c bioconda -n snparcher snakemake python=3.11

cademirch commented 10 months ago

Great, glad it worked!