zarquon42b / Morpho

R-package providing a toolset for (3D-based) Geometric Morphometrics
51 stars 16 forks source link

Classify.bgPCA() does not give correct results #18

Closed IrisMenendez closed 3 years ago

IrisMenendez commented 4 years ago

Hi, first thanks for implementing between-group PCA in your package. I'm using it to classify fossil specimens in dietary categories based on the teeth shape of extant specimens. I have all shapes (extant and fossils) superimposed in a gpagen objectGPA.all and a factor with the diet of extant species data_inf$DietCVA. Then I use this code:

bgPCA <- groupPCA(GPA.all$coords[,,1:1101], groups= data_inf$DietCVA, cv=T)
class_bgpca <- classify(bgPCA, newdata= two.d.array(GPA.all$coords[,,1102:1222]), cv=T)

I obtain that all the specimens are insectivores, which is not possible because is the category less probable (also see figure: pink filled dots are fossil specimens plotted in the groupPCA space using predict()).

> class_bgpca
specimens have been classified as:

 Insectivore 
        121 

image

This does not change when I specify the prior probabilities. I don't know if it's my fault or a bug in the function. Maybe I'm using the wrong object for the newdata?

Thank you, Iris

zarquon42b commented 4 years ago

Hi Iris, is this intentional that you use other indices for predicting? Also, two.d.arry arranges the coordinates in another way as Morpho. Try vecx instead and check back. class_bgpca <- classify(bgPCA, newdata= vecx(GPA.all$coords[,,1102:1222]), cv=T)

However, I just adapted the code so this will be handled automatically. If you install the latest snapshot from github using devtools, you can simply run: class_bgpca <- classify(bgPCA, newdata= GPA.all$coords[,,1102:1222], cv=T)

Best

Stefan

IrisMenendez commented 4 years ago

Thanks, Stefan! It worked perfectly. I will keep that in mind for the future.

I'm not sure what you mean by using different indices for predicting, I think I used the same ones. The coding was

predict_bgpca <- predict(bgPCA, GPA.all$coords[,,1102:1222])

#plotting
f_s <- ggplot(data=as.data.frame(bgPCA$Scores),aes(x=V1,y=V2))
f_s + 
  geom_point(aes(col=data_inf$DietCVA), alpha = 4/10, shape=21, stroke=1, size=2) + 
  scale_color_manual(values=col.diet)+
  geom_point(data=as.data.frame(predict_bgpca), aes(colour=class_bgpca2$class), shape=16, size=2)+
  ggtitle("landmarks bgPCA")+
  theme_classic()

Thanks a lot! Iris

zarquon42b commented 4 years ago

Sorry, my bad, everything is all right, I just realized that it wasn't the landmark indices but the specimen indices (I was a bit tired when I wrote the reply).