Hi , is there a way to run geoRge independent of XCMSet?

gmhhope commented 6 years ago

Hi Jordi,

Is there a way to run geoRge independent of XCMSet?

I was trying to use apLCMS and XCMS to run geoRge but found out it accepts XCMSet (which I have no idea how can I create it without all the peak picking, retcor and so on in xcms().

And all my samples were run 3 times and need to do average/mean to summarize them which I was not aware of a way to do within XCMSet.

I was learning your codes but found it kind of difficult to crack down all the details. Do you think if I can modify the script to adapt to just a feature table which contains mz (mzmed), rtime (rtmed) and feature intensities. I recognized your script will take advantage of rtime_min, rtime_max but I don't think it is a very important paramter right? Just need to have a specified minimum retention time window.

Let me know your suggestion. And if you can, give me any suggestions. Greatly appreciate your help!

Best, Minghao Gong, PhD candidate University of Florida

gmhhope commented 6 years ago

Also, when I go through the PuInc(), I didn't understand some details for example:

fc.test <- sapply(conditions, function(y) { apply(D1[ ,filtsampsint], 2, function (x) { ulm <- mean(x[intersect(grep(ULtag,classv),grep(y,classv))]) labm <- mean(x[intersect(grep(Ltag,classv),grep(y,classv))]) FC <- labm/ulm FC2 <- (-(ulm/labm)) FC[FC<1] <- FC2[FC<1] return(FC) }) }) fc.test <- data.frame(fc.test) colnames(fc.test) <- conditions

What does it mean by this FC2? What is the purpose? Hope you can help Thanks!

gmhhope commented 6 years ago

In basepeak_finder() function, is the signal/noise filtering very important? As in the sample, it looks like the parameter is set as "noise.quant = 0.0", which I guessed it means not doing any filtering. I hope if you can give me any suggestion? Besides, I am wondering whether xcms mode set as "intb" will do something similar? In my understanding, "intb" is the intensities which corrected with baseline. Can you also comment on this?

jcapelladesto commented 6 years ago

Hi Minghao,

Sorry for the delay, I was on holidays :) Thank you for your messages. I will respond each post separately:

geoRge takes advantage of the organized structure provided by XCMS, so as it is now, you cannot run it independently. But, as you propose it should be "run-able" on data with the same structure.

Please consider that if you do not run a feature detection algorithm as XCMS does with "Centwave" you need to align the peaks detected in the different replicates because geoRge compares them statistically.

In relation to rtmin and rtmax are not essential for the software and they could be overlooked if you adapt the code to your dataset, as they are a product of feature detection.

This step is a simple value transformation for ratios (fold-changes) lower than 1. It is a way to mantain the values to be > 1 for filtering afterwards. As an example: a ratio of 0.5 will be transformed to -2.
The "noise.quant" parameter does not need to be "optimized", I would recommend to leave it in 0.0 as it is only a way to filter results of geoRging, if the data is very noisy, there is a high probability that there are features that could match the rules of a "base peak" (see article). It should only be used if you know that your data contains a lot of noisy peaks or it could only be solved by rerunning XCMS with more strict parameters. I have little experience using "intb" in xcms, I believe that the values would change a little and could probably improve the results for some parts of the dataset but at the same time lose other features at other parts of it.

As a closing remark, I would try to adapt what we did in the paper and try to adapt it to the data format you have, I see you are doing a lot of hard work just make sure that it does make sense, because geoRge uses "features" and not "peaks", so you need to filter and align to prevent redundancy (this is by far not easy to do, although there are some papers published on the topic).

I hope that these answers help you.

gmhhope commented 6 years ago

Greatly appreciate your answers. This is very helpful! I will try to run geoRge recently and see whether I will need further assistance. BTW, you guys are BIG "apply" fan! I had a very hard time to go through all the s/t/lapply, but finally found it very instructive to write script like this, rather than for loop. It just made it much shorter. Again, thanks for your reply!

gmhhope commented 6 years ago

Hi Jordi, Could you also comment on the major algorithm difference between X13CMS and geoRge? What improvements made geoRge stand out demonstrated in your geoRge paper? Thanks ahead!

jcapelladesto commented 6 years ago

I recommend you to read both papers if you are interested in details.

In summary, the main difference is that geoRge first compares labelled and unlabelled samples to find "labelled features" whereas X13CMS looks for features that are separated within the mass value of the isotope in the same RT. We noticed that X13CMS tends to have more false positives than geoRge (which still does) while geoRge misses some features that X13CMS finds, due to statistic rules.

jcapelladesto / geoRge

Hi , is there a way to run geoRge independent of XCMSet? #9