zhengxwen / SNPRelate

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development version only)
http://www.bioconductor.org/packages/SNPRelate
98 stars 25 forks source link

snpgdsSampMissRate #25

Closed jingjin0322 closed 7 years ago

jingjin0322 commented 7 years ago

Hi Dr. Zheng,

I am trying to use SNPRelate to analyze my vcf file generated by 54 individual samples. I had a very naive question when using snpgdsSampMissRate, I was not quite sure how to read the output generated by this command. The following was the output I got. I was wondering what the numbers mean? Does that mean, for each individual sample, about 80% SNPs were missing? Does 1.00000 indicate no SNPs found in this sample at all?

snpgdsSampMissRate(genofile) [1] 0.7869513 0.7650544 0.8032866 0.7854701 0.7854561 0.7549653 0.7646911 0.7773047 0.7721205 0.8047911 [11] 0.7551796 0.7414341 0.7636244 0.7696751 0.7599353 0.7811894 1.0000000 0.7381223 0.7761682 0.7792191 [21] 0.7593065 0.7682451 0.8487200 0.7591761 0.7984890 0.7769833 0.7807795 0.7491662 0.7746264 0.7869560 [31] 0.7820465 0.7477362 0.7477409 0.7145346 0.7285037 0.9998789 0.7690276 0.6544474 0.6240777 0.6914127 [41] 0.6777323 0.6582156 0.6678296 0.6934295 0.6564503 0.6835687 0.6815611 0.6720124 0.6959681 0.6741224 [51] 0.6609871 0.6858930 1.0000000 0.7364408

Hope you can help me out. I'll really appreciate it!

Thanks, Jing

zhengxwen commented 7 years ago

snpgdsSampMissRate() returns the missing fraction for each sample. 1.0 indicates there is no SNP.

jingjin0322 commented 7 years ago

Hi Dr. Zheng,

Thanks for your kind reply!

I was wondering if it is normal to see such high missing rate of each sample? For comparing the similarity between samples, what will be the threshold of the desired missing rate?

Thanks again, Jing

zhengxwen commented 7 years ago

It is unusual to see such high missing rate. You might check the original VCF files via vcftools.