Closed MarcGose closed 2 years ago
Hi, most likely your "pop_code" in the "pop_code <- read.gdsn(index.gdsn(genofile, "sample.annot"))" ”pop_code <- read.gdsn(index.gdsn(genofile, path="sample.annot/pop_file"))“ are not uniquely pop factors. Please check these pop_code first, and see what are they. Please use your own real pop labels instead from "”pop_code <- read.gdsn(index.gdsn(genofile, path="sample.annot/pop_file"))“", make sure that knn < the minimum number of individuals in a pop.
Cheers,
Xinghu
Thanks Xinghu, that was it!
Just aother short question: Given that requirement for the knn parameter, is there any way to incorporate a population that is respresented by only one individual?
The answer is yes. One of the advantages of KLFDAPC is that it can preserve multimodal structures within pops, you can label a single individual to a higher level metapop. For example, if you think it can be merged into a very close pop that is different from other pops when running klfdapc, this can avoid removing single individual, after you get klfdapc features you can then plot it using true individual labels. This is one of the highlights of this method.
Cheers,
Xinghu
Thank you so much for your help Xinghu! Excited to play more with this method soon.
Cheers,
Marc
The answer is yes. One of the advantages of KLFDAPC is that it can preserve multmodal structures within pops, you can label a single individual to a higher level metepop. For example, if you think it can be merged into a very close pop that is different from other pops when running klfdapc, this can avoid removing single individual, after you get klfdapc features you can then plot it using true individual labels. This is one of the highlights of this method.
Cheers,
Xinghu
On Wed, Jun 15, 2022, 23:55 MarcGose @.***> wrote:
Thanks Xinghu, that was it!
Just aother short question: Given that requirement for the knn parameter, is there any way to incorporate a population that is respresented by only one individual?
— Reply to this email directly, view it on GitHub https://github.com/xinghuq/KLFDAPC/issues/5#issuecomment-1156648161, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHHTUDABALA3OU4WI5GFRIDVPH4FLANCNFSM5YJMEN5A . You are receiving this because you commented.Message ID: @.***>
Hello,
I wanted to try out klfdapc on my SNP dataset, following the tutorial on this Github. Everything seems to work fine until the kfldapc step, where I get the error "Error in matrix(1, N, M) : non-numeric matrix extent"
I tried assigning random labels as in the SARS-Cov-2 tutorial and this worked, so I reckon it must be something with my population codes, but I can't seem to figure out what it is. I would greatly appreciate any help with this.
This is my code:
popsex <- read.table("pop_file.info") pop_file <- popsex$V1
samp.annot <- data.frame(pop_file)
snpgdsVCF2GDS(vcf.fn = "C:/Users/MarcG/OneDrive/Desktop/VCF/WSD_GLs.vcf", out.fn = "WSD_GLs_GDS")
(genofile <- snpgdsOpen("WSD_GLs_GDS", readonly = FALSE))
read.gdsn(index.gdsn(genofile, "sample.id")) read.gdsn(index.gdsn(genofile, "snp.rs.id")) read.gdsn(index.gdsn(genofile, "genotype")) add.gdsn(genofile, "sample.annot", samp.annot) pop_code <- read.gdsn(index.gdsn(genofile, "sample.annot")) pop_code <- read.gdsn(index.gdsn(genofile, path="sample.annot/pop_file")) pop_code=factor(pop_code,levels=unique(pop_code))
pcadata <- SNPRelate::snpgdsPCA(genofile, autosome.only = FALSE)
snpgdsClose(genofile)
normalize <- function(x) { return ((x - min(x)) / (max(x) - min(x))) }
pcanorm=apply(pcadata$eigenvect[,1:20], 2, normalize)
kmat <- kmatrixGauss(pcanorm,sigma=5)
klfdapc=KLFDA(kmat, pop_code, r=3, knn = 2) Error in matrix(1, N, M) : non-numeric matrix extent