hhabra / metabCombiner

metabCombiner R Package: Paired Untargeted Metabolomics Feature Matching & Data Concatenation
10 stars 2 forks source link

how to make the example data work #13

Closed tiwa1125 closed 2 years ago

tiwa1125 commented 2 years ago

Hi, I found your package recently and think it looks interesting for our research, while when I tried to run you script, I met a couple of issues. Firstly , while is the file "path_to_data_file.txt"? Unfortunately I did not find it even I installed you packages. Secondly, do you have a full scripts than can fully work for your example data, I found there are many symbols such as "..." in your script that not recognized by R and I could not run these codes for the example data. I would like to see what is the output of your package and whether we can apply to our data and how should we prepare our data format.

Thank you very much in advance. Best Tingting

hhabra commented 2 years ago

Hello Tingting, Forgive me if this was unclear, but "file/path/to/dataset1.txt" is a reference to a dataset in your own directory and not a real dataset that exists inside this package. For a full working example, I have datasets stored in the package called "plasma30" and "plasma20." Here's a full workflow with these two datasets that works:

data(plasma30)
data(plasma20)
p30 <- metabData(plasma30, samples = "CHEAR")
p20 <- metabData(plasma20, samples = "Red", rtmax = 17.25)
p.comb <- metabCombiner(xdata = p30, ydata = p20, binGap = 0.0075)
p.comb <- selectAnchors(p.comb, tolmz = 0.003, tolQ = 0.3, windy = 0.02)
p.comb <- fit_gam(p.comb, k = 20, iterFilter = 1)
p.comb <- calcScores(p.comb, A = 75, B = 14, C = 0.5)

For full outputs (including less likely matches and program labels) use:

p.comb <- labelRows(p.comb, maxRankX = 2, maxRankY = 2, maxRTerr = 0.5,
                    delta = 0.1, resolveConflicts = FALSE, remove = FALSE)

For reduced outputs with automated program matches, use: p.comb.2 <- reduceTable(p.comb) Let me know if you have any further questions or if anything doesn't make sense.

Best, Hani

tiwa1125 commented 2 years ago

Dear Hani, Thank you very muchfor your reply, I will try it out. .My project is working on human retrospective data that analyzed along many years, the retention time shift is a huge problem since they were come from so many different batches. We use xcms to process our data, and our dataset is also quite large, normally includes thousands of samples. I would also like to know whether the function this package can do any improvement for our peak alignment in our case. It would be very attractive for me if it can deal with our problems.

Best Tingting

From: Hani Habra @.> Sent: Wednesday, February 16, 2022 9:58 PM To: hhabra/metabCombiner @.> Cc: Tingting Wang @.>; Author @.> Subject: Re: [hhabra/metabCombiner] how to make the example data work (Issue #13)

Hello Tingting, Forgive me if this was unclear, but "file/path/to/dataset1.txt" is a reference to a dataset in your own directory and not a real dataset that exists inside this package. For a full working example, I have datasets stored in the package called "plasma30" and "plasma20." Here's a full workflow with these two datasets that works:

data(plasma30) data(plasma20) p30 <- metabData(plasma30, samples = "CHEAR") p20 <- metabData(plasma20, samples = "Red", rtmax = 17.25) p.comb <- metabCombiner(xdata = p30, ydata = p20, binGap = 0.0075) p.comb <- selectAnchors(p.comb, tolmz = 0.003, tolQ = 0.3, windy = 0.02) p.comb <- fit_gam(p.comb, k = 20, iterFilter = 1) p.comb <- calcScores(p.comb, A = 75, B = 14, C = 0.5)

For full outputs (including less likely matches and program labels) use: p.comb <- labelRows(p.comb, maxRankX = 2, maxRankY = 2, maxRTerr = 0.5, delta = 0.1, resolveConflicts = FALSE, remove = FALSE)

For reduced outputs with automated program matches, use: p.comb.2 <- reduceTable(p.comb)

Let me know if you have any further questions or if anything doesn't make sense.

Best, Hani

— Reply to this email directly, view it on GitHubhttps://github.com/hhabra/metabCombiner/issues/13#issuecomment-1042303699, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AVISBXYCPQA7LHFYPEC7NXDU3QFTZANCNFSM5ORQZOGQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.**@.>>

hhabra commented 2 years ago

Hi, sorry for my slow reply. XCMS does indeed have many inaccuracies in aligning between LC-MS batches, especially for larger experiments. There is a functionality in metabCombiner for multi-batch data, called batchCombine. It applies a step-wise alignment between consecutive batches, starting from the first batch all the way to the final batch. There is one caveat to this method in that each batch should be represented as a separate pre-processed feature table, rather than pre-processing them all at once. This method is not yet published, but I can still walk you through an example usage.

First, you should create separate folders containing .mzML or related files for each LC-MS metabolomics batch (batch01, batch02, etc...), put them all in a directory (call it "experiment"), then run XCMS to extract processed tables for each file. This code below runs XCMS on each batch individually and writes them to a folder in the chosen directory called "XCMS."

library(xcms)
setwd("file/path/to/experiment/directory")

if(!dir.exists("XCMS"))  dir.create("XCMS")

for(batch in list.dirs()){
    xset <- xcmsSet(batch, method = "centWave", peakwidth   = c(3, 30),
                 noise  = 250, ppm  = 20, prefilter  = c(3,1000), snthresh  = 10,
                 mzdiff  = -0.001, mzCenterFun     = "wMean",  integrate   = 1,
                 fitgauss  = TRUE, verbose.columns = FALSE,
                 BPPARAM = SnowParam(15))
    xset2 <- group( xset, method  = "density", bw  = 15, mzwid   = 0.015,
                                minfrac = 0.5, minsamp = 1)
    xset3 <- retcor(xset2, smooth = "loess", family = "symmetric",
                               missing = 5, span = 0.25)
    xset4 <- group( xset3, method  = "density",  bw  = 5, mzwid   = 0.015, 
                               minfrac = 0.5, minsamp = 1)
    xset5 <- fillPeaks(xset4)
    xtable <- peakTable(xset5)
    xtable$rt <- xtable$rt / 60      #strongly recommend converting rt to minutes

    write.csv(xtable, file = paste("XCMS/", batch, "_xcms.csv", sep = ""), na = "",
                       row.names = FALSE)) 
}

With the results, you can load the result tables into R as a list object, and then into a list of metabData objects. For the samples argument, I recommend mapping to a keyword common to the names of QC samples, and for another common keyword to normal samples set as the "extra" argument. Note that this assumes that samples are similarly named across all the batches.

batchdata <- lapply(grep("_xcms.csv",  list.files("XCMS"), value = TRUE), read.csv, 
                                   stringsAsFactors = FALSE)
batchmetdata <- lapply(batchdata, metabData, samples = "QC_name", 
                                         extra = "samples_name")

#recommend having a unique identifier for each batch
names(batchmetdata) <- paste("b", seq_along(batchmetdata), sep = "")

Next, we need the lists of parameters to be applied in each combining operation.

saparam <- selectAnchorsParam(tolmz = 0.003, tolQ = 0.3, tolrtq = 0.2)
fgparam <- fitgamParam(k = 20, iterFilter = 2)
csparam <- calcScoresParam(A = 70, B = 30, C = 0.8)
rdparam <- reduceTableParam(minScore = 0.5, maxRTerr = 0.3)

Finally, we bring it all together with the batchCombine function and obtain the results.

Setting union = TRUE gives an approximate union of features, whereas FALSE will give the intersection of features found across all batches only.

combinedRes <- batchCombine(batchmetdata, binGap = 0.0075, union = TRUE,
                            anchorParam = saparam, fitParam = fgparam,
                            scoreParam = csparam, reduceParam = rdparam,
                            means = c(TRUE, TRUE, TRUE)) 

#resulting table of this operation
table <- combinedRes$table

#for use in further _metabCombiner_ alignments
object <- combinedRes$object

I'm hoping to create an in depth vignette on this in the future, but I hope this was helpful.

Best of luck, Hani

hhabra commented 2 years ago

If you find that this method is not satisfactory, some other methods I can point you to include:

batchCorr: https://pubmed.ncbi.nlm.nih.gov/27746707/

apLCMS 2.0: https://www.nature.com/articles/s41598-020-70850-0

There may be others I have not mentioned or tested, but these are the ones I know of that specifically address batch-aware LC-MS metabolomics alignment.

tiwa1125 commented 2 years ago

Dear Hani, Thank you very much, I will have a try for our data☺

Best Tingting

From: Hani Habra @.> Sent: Wednesday, March 2, 2022 12:36 PM To: hhabra/metabCombiner @.> Cc: Tingting Wang @.>; Author @.> Subject: Re: [hhabra/metabCombiner] how to make the example data work (Issue #13)

Hi, sorry for my slow reply. XCMS does indeed have many inaccuracies in aligning between LC-MS batches, especially for larger experiments. There is a functionality in metabCombiner for multi-batch data, called batchCombine. It applies an identical alignment between pairs of batches, starting from the first batch all the way to the final batch. There is one caveat to this method in that each batch should be represented as a separate pre-processed feature table, rather than pre-processing them all at once. This method is not yet published, but I can still walk you through an example usage.

First, you should create separate folders containing .mzML or related files for each LC-MS metabolomics batch (batch01, batch02, etc...), put them all in a directory (call it "experiment"), then run XCMS to extract processed tables for each file. This code below runs XCMS on each batch individually and writes them to a folder in the chosen directory called "XCMS."

library(xcms)

setwd("file/path/to/experiment/directory")

if(!dir.exists("XCMS")) dir.create("XCMS")

for(batch in list.dirs()){

xset <- xcmsSet(batch, method = "centWave", peakwidth   = c(3, 30),

             noise  = 250, ppm  = 20, prefilter  = c(3,1000), snthresh  = 10,

             mzdiff  = -0.001, mzCenterFun     = "wMean",  integrate   = 1,

             fitgauss  = TRUE, verbose.columns = FALSE,

             BPPARAM = SnowParam(15))

xset2 <- group( xset, method  = "density", bw  = 15, mzwid   = 0.015,

                            minfrac = 0.5, minsamp = 1)

xset3 <- retcor(xset2, smooth = "loess", family = "symmetric",

                           missing = 5, span = 0.25)

xset4 <- group( xset3, method  = "density",  bw  = 5, mzwid   = 0.015,

                           minfrac = 0.5, minsamp = 1)

xset5 <- fillPeaks(xset4)

xtable <- peakTable(xset5)

xtable$rt <- xtable$rt / 60      #strongly recommend converting rt to minutes

write.csv(xtable, file = paste("XCMS/", batch, "_xcms.csv", sep = ""), na = "",

                   row.names = FALSE))

}

With the results, you can load the result tables into R as a list object, and then into a list of metabData objects. For the samples argument, I recommend mapping to a keyword common to the names of QC samples, and for another common keyword to normal samples set as the "extra" argument. Note that this assumes that samples are similarly named across all the batches.

batchdata <- lapply(grep("_xcms.csv", list.files("XCMS"), value = TRUE), read.csv,

                               stringsAsFactors = FALSE)

batchmetdata <- lapply(batchdata, metabData, samples = "QC_name",

                                     extra = "samples_name")

recommend having a unique identifier for each batch

names(batchmetdata) <- paste("b", seq_along(batchmetdata), sep = "")

Next, we need the lists of parameters to be applied in each combining operation.

saparam <- selectAnchorsParam(tolmz = 0.003, tolQ = 0.3, tolrtq = 0.2)

fgparam <- fitgamParam(k = 20, iterFilter = 2)

csparam <- calcScoresParam(A = 70, B = 30, C = 0.8)

rdparam <- reduceTableParam(minScore = 0.5, maxRTerr = 0.3)

Finally, we bring it all together with the batchCombine function and obtain the results.

Setting union = TRUE gives an approximate union of features, whereas FALSE will give the intersection of features found across all batches only.

combinedRes <- batchCombine(batchmetdata, binGap = 0.0075, union = TRUE,

                        anchorParam = saparam, fitParam = fgparam,

                        scoreParam = csparam, reduceParam = rdparam,

                        means = c(TRUE, TRUE, TRUE))

resulting table of this operation

table <- combinedRes$table

for use in further metabCombiner alignments

object <- combinedRes$object

I'm hoping to create an in depth vignette on this in the future, but I hope this was helpful.

Best of luck, Hani

— Reply to this email directly, view it on GitHubhttps://github.com/hhabra/metabCombiner/issues/13#issuecomment-1056829784, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AVISBX46XOEF5BN3WGFSFBLU55HAVANCNFSM5ORQZOGQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.**@.>>