harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Error in rule index_reference #140

Closed ffertrindade closed 7 months ago

ffertrindade commented 7 months ago

Hi there!

I'm facing an issue while running snpArcher, but I cannot determine the exact source of the problem.

Here is the error that shows where the pipeline stopped:

Error in rule index_reference: jobid: 4 input: results/my_ref/data/genome/my_ref.fna output: results/my_ref/data/genome/my_ref.fna.sa, results/my_ref/data/genome/my_ref.fna.pac, results/my_ref/data/genome/my_ref.fna.bwt, results/my_ref/data/genome/my_ref.fna.ann, results/my_ref/data/genome/my_ref.fna.amb, results/my_ref/data/genome/my_ref.fna.fai, results/my_ref/data/genome/my_ref.dict log: logs/my_ref/indexref/log.txt (check log file(s) for error details) conda-env: /scratch/proj/snpArcher/.snakemake/conda/2148483cdd1a94b3909c4d1d2a1b87b3 shell:

    bwa index results/my_ref/data/genome/my_ref.fna 2> logs/my_ref/index_ref/log.txt
    samtools faidx results/my_ref/data/genome/my_ref.fna --output results/my_ref/data/genome/my_ref.fna.fai >> logs/my_ref/index_ref/log.txt
    samtools dict results/my_ref/data/genome/my_ref.fna -o results/my_ref/data/genome/my_ref.dict >> logs/my_ref/index_ref/log.txt 2>&1

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job index_reference since they might be corrupted: results/my_ref/data/genome/my_ref.fna.sa, results/my_ref/data/genome/my_ref.fna.pac, results/my_ref/data/genome/my_ref.fna.bwt, results/my_ref/data/genome/my_ref.fna.ann, results/my_ref/data/genome/my_ref.fna.amb [Wed Nov 1 23:02:23 2023] Finished job 61. 18 of 68 steps (26%) done Removing temporary output results/my_ref/genmap/my_ref.genmap.bedgraph. Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-11-01T211947.870896.snakemake.log

And here is the end of logs/my_ref/index_ref/log.txt file:

[BWTIncConstructFromPacked] 440 iterations done. 3821549696 characters processed. [BWTIncConstructFromPacked] 450 iterations done. 3843733744 characters processed. [bwa_index] 1306.39 seconds elapse. [bwa_index] Update BWT... 9.61 sec [bwa_index] Pack forward-only FASTA... 7.67 sec [bwa_index] Construct SA from BWT and Occ... 493.67 sec [main] Version: 0.7.12-r1039 [main] CMD: bwa index results/my_ref/data/genome/my_ref.fna [main] Real time: 1842.313 sec; CPU: 1830.568 sec

I'm attaching the logs and config files here, just in case they prove to be useful. I'm attempting to analyze data for 8 individuals with ~10x coverage, a genome size about 2Gb, and a local reference genome file. 2023-11-01T211947.870896.snakemake.log config.txt log.txt resources.txt samples.csv

For running, inside a pbs job script, I just activate the snakemake env and ran: snakemake --snakefile workflow/Snakefile --cores 32

For installation, since I already had a snakemake env, I simply cloned snpArcher, tested it using the ecoli dataset, and everything went well.

Would you know what the problem could be?

Thanks! Fer

tsackton commented 7 months ago

Can you try again, adding the --use-conda option to snakemake? My guess is that you are running into an issue with a program not being available in your local environment. snpArcher has rule-specific conda environments to avoid various dependency conflicts for this reason, but you need to tell snakemake to use these environments with the --use-conda option.

Let us know if that helps. If not, we can debug further.

cademirch commented 7 months ago

Typically when I forget --use-conda, I'll see a command not found in the log. Seems like if this is the issue, then not all stderr is making it to the logs for index ref.

tsackton commented 7 months ago

@cademirch it looks like in:

samtools faidx results/my_ref/data/genome/my_ref.fna --output results/my_ref/data/genome/my_ref.fna.fai >> logs/my_ref/index_ref/log.txt

We don't save stderr to the log, probably should fix that. My guess is that @ffertrindade doesn't have samtools installed in their path but does have bwa.

ffertrindade commented 7 months ago

Hey guys

Thanks for the feedback!

@tsackton I'm running now adding what you suggest and It's working! It's already at mapping step.

After what you mentioned, I thought I'd check the job stdout (instead of the pipeline logs) and found the following

faidx: invalid option -- '-'

Usage: samtools faidx <file.fa|file.fa.gz> [ [...]]

So, the thing was that I probably have installed a different version of samtools.

Anyway, I think this is solved now. Thank you!!

cademirch commented 7 months ago

Great!