zhengxwen / SNPRelate

R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development version only)
http://www.bioconductor.org/packages/SNPRelate
98 stars 25 forks source link

what method of clustering do you use? #16

Closed orchid00 closed 8 years ago

orchid00 commented 8 years ago

According to this: http://www.inside-r.org/packages/cran/SNPRelate/docs/snpgdsHCluster the method is "complete"

but when I check my cluster object it says "average"

cluster <- snpgdsHCluster(dist, need.mat=TRUE, hang=-1)

attributes(cluster_$hclust) $names [1] "merge" "height" "order" "labels" "method"
[6] "call" "dist.method"

(cluster_$hclust$method) [1] "average"

(cluster_27$hclust$dist.method) NULL

This is important for my analysis would you please confirm? thanks! the last PDF https://www.bioconductor.org/packages/release/bioc/manuals/SNPRelate/man/SNPRelate.pdf has no details.

I would also like to know what kind of distance metric is used for snpgdsDiss.

zhengxwen commented 8 years ago

This is a typo. snpgdsHCluster() calls hclust() withmethod="average".

snpgdsDiss() returns a distance metric based on the estimated Fst at the individual level. You might cite this paper for snpgdsDiss(): Weir BS, Zheng X. SNPs and SNVs in Forensic Science. 2015. Forensic Science International: Genetics Supplement Series. It returns $1 - \beta_ij$ (\beta is described in SNPs and SNVs in Forensic Science. 2015)

orchid00 commented 8 years ago

Thank you! great package btw :+1: