sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
178 stars 81 forks source link

Alignment of large dataset using XCMS #579

Open tiwa1125 opened 2 years ago

tiwa1125 commented 2 years ago

Hi, I am working with xcms for our super large dataset most of my time (I use cluster most time), I know xcms works quite well for my smaller dataset that less than 1000 samples analyzed in short period, but when I move to large one, for example 5000, 10000 or even more samples of UPLC-TOF data that come from longer period, many problems jumped out, the biggest one is retention time shift and alignment, we did a lot of work and tried to fix it, I tried both methods available in XCMS, and optimized a lot of parameters as well, but have not got really nice results, wrong alignment lead to wrong integration for many of my samples. Our samples were analyzed over at least five years with exactly same method, four internal standards (IS) were constantly added into all the samples, I can also find more landscape metabolites other than IS constantly in all samples that can be used for RT correction. Our RT shift of samples is quite large, sometimes up to 1 min, so I have to think about some other ways to improve the accuracy of alignment, e.g., I am thinking to do a pre-alignment for each file using relative RT corrected IS or landscape compounds.

So I am writing to kindly ask whether there are some other function or packages available connected to XCMS and can do this kind of tricks? Or any suggestions? Very long question, and thank you very much in advance.

Best Tingting

jorainer commented 2 years ago

If you know already IS or other landmark features it should be possible to feed them into the xcms peak group alignment. PeakGroupsParam has a parameter peakGroupsMatrix which allows to use pre-define the peak groups. I have to admit that I never tried that option before, though.

Also, it is possible to run several alignments sequentially by calling applyAdjustedRtime in between the adjustRtime calls.

Finally, there is always the possibility to manually change the retention times (e.g. shift them by a constant value):

the code below adds a value of 4 to the retention times of the 3rd file in the xdata object

fData(xdata)[fData(xdata)$fileIdx == 3, "retentionTime"] <- fData(xdata)[fData(xdata)$fileIdx == 3, "retentionTime"] + 4