parklab / NGSCheckMate

Software program for checking sample matching for NGS data
MIT License
126 stars 48 forks source link

ZeroDivisionError: float division by zero #13

Closed nskbe closed 6 years ago

nskbe commented 6 years ago

Hi, I'm trying to run a test on a .vcf file generated with GATK from one sample of RNA-seq data against itself and a .bed file containing SNPs from the vcf file, so everything is matching. However, I'm getting the error below. The input files are here: https://www.dropbox.com/s/w7g3kkgbk0g708l/dat.tgz?dl=0 , I'm using only the first 10,000 lines but I get the same error with the entire data set. Thanks for your help!

$ python /home/user/programs/NGSCheckMate/ncm.py -V -d data/ -O out -bed snps.BED Generate Data Set from data/ using this bed file : snps.BED Traceback (most recent call last): File "/home/user/programs/NGSCheckMate/ncm.py", line 1460, in createDataSetFromDir(base_dir,bedFile) File "/home/user/programs/NGSCheckMate/ncm.py", line 220, in createDataSetFromDir real_depth[file] = depth[file] / float(real_count[file]) ZeroDivisionError: float division by zero

sejooning commented 6 years ago

Hi nskbe, thank you for using NGSCheckMate.

This problem is caused by genomic location info in the bed file. Our SNP_GRCh37_hg19_wChr.bed file include chromosome(tab)location1(tab)location2(tab)rsID.. columns, and the output genomic location info in vcfs of samtools 0.1.19 mpileup and bcftools is location2. So our algorithm used location2 as an identifier. So, you will get the proper result modifying the bed file so that location2 matches the genomic location information in the vcf file.

Best, Sejoon Lee

nskbe commented 6 years ago

I ran ncm.py on a few RNA-seq .bam files and it ran fine with the SNP list that came with NGSCheckMate. Then I added a .vcf file containing a file with genotypes obtained with an array and I get the error below again:

Traceback (most recent call last): File "/home/user/programs/NGSCheckMate/ncm.py", line 1460, in createDataSetFromDir(base_dir,bedFile) File "/home/user/programs/NGSCheckMate/ncm.py", line 231, in createDataSetFromDir real_depth[file] = depth[file] / float(real_count[file]) ZeroDivisionError: float division by zero

I'm running: python ~/programs/NGSCheckMate/ncm.py -V -d input/ -O output/ -bed ~/programs/NGSCheckMate/SNP/SNP_GRCh37_hg19_wChr.bed

The input files are here: https://www.dropbox.com/s/g5umthxj24v8evu/input.tgz?dl=0

Thanks for your help!

nskbe commented 6 years ago

I figured out the problem. NGSCheckMate relies on the DP4 field that reports the read depth in each allele, which makes sense. Array genotyping data doesn't have this info, since it's not NGS data.