sneumann / CAMERA

This is the git repository matching the Bioconductor package CAMERA: Collection of annotation related methods for mass spectrometry data
11 stars 22 forks source link

groupCorr cor_exp_th = 0.90 separate two > 0.9995 #50

Open cbonnefoy opened 4 years ago

cbonnefoy commented 4 years ago

Hello and thanks in advance for you help

I am using CAMERA

xsgf<-as(fdata,"xcmsSet") xsaf<-xsAnnotate(xsgf,sample = c(3:5), polarity = "positive") xsaFf<-groupFWHM(xsaf, sigma = 6 , perfwhm = 0.6, intval = "maxo") xsaFCIf<-findIsotopes(xsaFf, maxcharge=3, maxiso=4, ppm=5, mzabs=0.015, intval="maxo", minfrac=1, isotopeMatrix = NULL,filter = TRUE) xsFCf <- groupCorr(xsaFCIf, cor_exp_th = 0.90,calcCiS = FALSE, calcCaS = TRUE) xsaFCIAf<-findAdducts(xsFCf,polarity = "positive")

At the end, I get groups in different ps-groups

Example

line 2227 and line 2239 after FWHM pc-group = 21 for both after isotopes, corr and adducts pc-group = 756 and 757 respectively the correlation coefficient calculated by calcCaS(xsaFCIf,corval=0.90, pval=0.05, intval="maxo") is 0.9999

Could someone explain me why? What is the role of the pval? How could I access it?

Christelle

stanstrup commented 4 years ago

The p-value comes from the regression done by Hmisc::rcorr. I am not a statistician, so I won't explain but the documentation says "[...] P, the asymptotic P-values". I think you can find an explanation here: https://statisticsbyjim.com/regression/interpret-coefficients-p-values-regression/

It is unclear if you question was something else...

cbonnefoy commented 4 years ago

Thanks for your comment Jan.

Now I can calculate the p_values by Hmisc::rcorr.

I understand it is to test the significativity of the correlation which depends on the number of pairs used. Because I have only 12 samples, even a correlation of 0.6 pass the test (= is less than 0.05).

What I don't understand is why two variables that are in the same pseudospectra before groupCorr are splitted in two different after, knowing that the correlation is >0.90 and the significativity far less than 0.05

stanstrup commented 4 years ago

Sounds strange. Are they the only features in the group? you could try graphMethod="lpc" to see if it is the clustering algorithm that does something strange. I think example data is probably needed to investigate this further.

cbonnefoy commented 4 years ago

Yes when they split they are often the only feature in the group

I only test graphMethod for one psg_list. They both result in splitting, even if the splitting is sligthly different.

I noticed that the number of the pcgroups after splitting are very near, they only differ by one. It seems to me that the features are rejected from the originating group but I don't know how

Here is an example for 3 molecules. Have a look at carbamazepine

Thanks

FWHM_Corr_1_12_Carbamazepine_Sulfamethoxazole_Ketoprofene.xlsx

stanstrup commented 4 years ago

Which intensity values did you use for that analysis? You used maxo for groupFWHM but default for groupCorr is into. I am wondering if there are some NA values in play. Did you fill peaks?

cbonnefoy commented 4 years ago

I used maxo for groupCorr too. I filled peaks but for some features there are still many NA.

I agree a part of my problem can come from that but I think there is another reason because I have examples where no peak is missing.