harvardinformatics / degenotate

MIT License
40 stars 4 forks source link

MK test KeyError: '*' #52

Open LyAndraLujan opened 1 month ago

LyAndraLujan commented 1 month ago

When I run an MK test on the corBra test data it runs fine, but when I run it on my data I am getting KeyError: '*'. I'm running multiple chromosomes separately, the error occurs in most but not all. My code is: python degenotate.py -a /projects/singhlab/lyandral/Crows/Ref/genomic1.gtf -g /projects/singhlab/lyandral/Crows/Ref/Corvus_moneduloides.1.fasta -v /projects/singhlab/lyandral/Crows/VCF/Chrom1/NoMiss.vcf.gz -u /projects/singhlab/lyandral/Crows/Others.txt -e /projects/singhlab/lyandral/Crows/Carrion.txt -o test --overwrite -sfs

and I get

# 08.13.2024 13:59:03 Caclulating degeneracy per transcript Processed 0 / 3779 transcripts... Traceback (most recent call last): File "/gpfs/projects/singhlab/lyandral/Crows/degenotate-main/degenotate.py", line 94, in <module> globs = degen.processCodons(globs) File "/gpfs/projects/singhlab/lyandral/Crows/degenotate-main/degenotate_lib/degen.py", line 369, in processCodons mk_codons, globs = VCF.getVariants(globs, transcript, transcript_region, codons, extra_leading_nt, extra_trailing_nt) File "/gpfs/projects/singhlab/lyandral/Crows/degenotate-main/degenotate_lib/vcf.py", line 114, in getVariants alt_nts = [ globs['complement'][base] for base in alt_nts ]; File "/gpfs/projects/singhlab/lyandral/Crows/degenotate-main/degenotate_lib/vcf.py", line 114, in <listcomp> alt_nts = [ globs['complement'][base] for base in alt_nts ]; KeyError: '*' Do you have any idea what could be causing this?

I'm running version 1.2.4 from bioconda.

tsackton commented 4 weeks ago

Hmm. Based on the error message I guess you have some positions in your VCF where you have a as the ALT allele. degenotate is trying to figure out what the complement of the base is but obviously. The simplest solution is to do a little pre-filtering on your VCF to remove anything that is not a biallelic SNP.