harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Error with qc_admixture rule #122

Closed ArsenaultResearch closed 8 months ago

ArsenaultResearch commented 9 months ago

Hi all, I have been trying to get the qc_admixture rule to successfully run but keep running into the same error (included below). I assume based on the error message that I need to adjust the chromosome names in some way. Do you all have any recommendations on how I can solve this error? Any assistance you can provide would be very appreciated. Thanks, Sam

My chromosome names look like so:

contig_17684unscaffolded PGA_scaffold2__141_contigslength_26305080

Error message - [Tue Sep 5 16:24:25 2023] rule qc_admixture: input: results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bed, results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim, results/Lerg_assemblyV1/Q C/Lerg_assemblyV1_E.fam output: results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.3.Q, results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.2.Q jobid: 0 reason: Missing output files: results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.2.Q, results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.3.Q wildcards: refGenome=Lerg_assemblyV1, prefix=Lerg_assemblyV1_E resources: mem_mb=8000, mem_mib=7630, disk_mb=1000, disk_mib=954, tmpdir=/tmp

Activating conda environment: .snakemake/conda/5aabced42964e6ba7d428c98db59b6af_ ADMIXTURE Version 1.3.0 Copyright 2008-2015 David Alexander, Suyash Shringarpure, John Novembre, Ken Lange


Please cite our paper! Information at www.genetics.ucla.edu/software/admixture

Random seed: 43 Point estimation method: Block relaxation algorithm Convergence acceleration algorithm: QuasiNewton, 3 secant conditions Point estimation will terminate when objective function delta < 0.0001 Estimation of standard errors disabled; will compute point estimates only. Invalid chromosome code! Use integers. [Tue Sep 5 16:24:25 2023] Error in rule qc_admixture: jobid: 0 input: results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bed, results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim, results/Lerg_assemblyV1/Q C/Lerg_assemblyV1_E.fam output: results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.3.Q, results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.2.Q conda-env: /n/holyscratch01/triblelab/Users/sarsenault/snpArcher/.snakemake/conda/5aabced42964e6ba7d428c98db59b6af shell:

    mv results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim.orig
    paste <(cut -f 1 results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim.orig | sed 's/[^0-9]//g') <(cut -f 2,3,4,5,6 results/Lerg_asse

mblyV1/QC/Lerg_assemblyV1_E.bim.orig) > results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim

    admixture results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bed 2
    admixture results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bed 3

    mv "Lerg_assemblyV1_E".2.* results/Lerg_assemblyV1/QC
    mv "Lerg_assemblyV1_E".3.* results/Lerg_assemblyV1/QC

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message

erikenbody commented 9 months ago

yeah that does seem to be the problem. We try to deal with it by deleting all the characters in the contig names (with the sed command you can see there), but it doesn't seem to have dealt with your contig names that are a bit omplex. Can you run something like this, to show the contigs names in the name repaired bim file you have?

cat results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim | cut -f 1 | sort | uniq

let me know if this works -

Erik

On Tue, Sep 5, 2023 at 1:44 PM Sam Arsenault @.***> wrote:

Hi all, I have been trying to get the qc_admixture rule to successfully run but keep running into the same error (included below). I assume based on the error message that I need to adjust the chromosome names in some way. Do you all have any recommendations on how I can solve this error? Any assistance you can provide would be very appreciated. Thanks, Sam

My chromosome names look like so:

contig_17684unscaffolded PGA_scaffold2__141_contigslength_26305080

Error message - [Tue Sep 5 16:24:25 2023] rule qc_admixture: input: results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bed, results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim, results/Lerg_assemblyV1/Q C/Lerg_assemblyV1_E.fam output: results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.3.Q, results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.2.Q jobid: 0 reason: Missing output files: results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.2.Q, results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.3.Q wildcards: refGenome=Lerg_assemblyV1, prefix=Lerg_assemblyV1_E resources: mem_mb=8000, mem_mib=7630, disk_mb=1000, disk_mib=954, tmpdir=/tmp

Activating conda environment: .snakemake/conda/5aabced42964e6ba7d428c98db59b6af_ ADMIXTURE Version 1.3.0 Copyright 2008-2015 David Alexander, Suyash Shringarpure, John Novembre, Ken Lange

Please cite our paper! Information at www.genetics.ucla.edu/software/admixture

Random seed: 43 Point estimation method: Block relaxation algorithm Convergence acceleration algorithm: QuasiNewton, 3 secant conditions Point estimation will terminate when objective function delta < 0.0001 Estimation of standard errors disabled; will compute point estimates only. Invalid chromosome code! Use integers. [Tue Sep 5 16:24:25 2023] Error in rule qc_admixture: jobid: 0 input: results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bed, results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim, results/Lerg_assemblyV1/Q C/Lerg_assemblyV1_E.fam output: results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.3.Q, results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.2.Q conda-env: /n/holyscratch01/triblelab/Users/sarsenault/snpArcher/.snakemake/conda/5aabced42964e6ba7d428c98db59b6af shell:

mv results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim.orig
paste <(cut -f 1 results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim.orig | sed 's/[^0-9]//g') <(cut -f 2,3,4,5,6 results/Lerg_asse

mblyV1/QC/Lerg_assemblyV1_E.bim.orig) > results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bim

admixture results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bed 2
admixture results/Lerg_assemblyV1/QC/Lerg_assemblyV1_E.bed 3

mv "Lerg_assemblyV1_E".2.* results/Lerg_assemblyV1/QC
mv "Lerg_assemblyV1_E".3.* results/Lerg_assemblyV1/QC

(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message

— Reply to this email directly, view it on GitHub https://github.com/harvardinformatics/snpArcher/issues/122, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBA3N5SMYYKDXPSAZSXWVDXY6FL3ANCNFSM6AAAAAA4MKPYDI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ArsenaultResearch commented 9 months ago

Output is attached. All seem to be numeric and I wasn't able to find any letters with grep.

snpArcher_contigNames.txt

tsackton commented 9 months ago

This one is a puzzle.

Can you check to make sure you have no duplicate contig names post-processing? Theoretically I guess that might lead to the same error as non-numeric chromosome names. Google is not very informative on this point, unfortunately.

Also, perhaps it is worth trying to run admixture outside of snakemake, just to confirm the same error message.

Finally, you can hack things to remove the admixture plot if necessary - in the QC module Snakefile, comment out the admixture files in the qc_plots rule, and in the qc_dashboard_interactive.Rmd script comment out the admixture section at the bottom. That should let you get a QC plot without the admixture results.

cademirch commented 9 months ago

@ArsenaultResearch, were you able to resolve this?

ArsenaultResearch commented 8 months ago

Hi, apologies for the delay. I was able to hack out the admixture bit as Tim suggested and the rest ran correctly. Thanks for the help on this!