sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
187 stars 80 forks source link

Perform peak detection with a defined roilist #596

Open gmhhope opened 2 years ago

gmhhope commented 2 years ago

Hi Johannes,

Have been going through several options to perform peak detection with a defined list of ROI. I accidentally used ManualChromPeak and at the end found out that there is no peak detection there but just integration across mz-rtime pair. Well, that is a good lesson though.

Now I come back to peak detection with a defined roilist. And if I understand correctly, I need to provide a roilist just like this (copied from https://support.bioconductor.org/p/123319/ ):

list(c(scmin = 51, scmax = 800, mzmin = 216.1321, mzmax = 216.1361, length = 20, intensity = 1000))
  1. I don't know exactly what is the parameter in the roilist especially length and intensity. Is that required parameters to define the minimum length of scans and minimum intensity to be able to define a peak?
  2. How to easily access the scan in OnDiskMSnExp object? I found the scan information is hidden in the names(rtime(OnDiskMSnExp)). But I would like to know how to easily translate from rtime to scanindex?

Thanks, Minghao Gong

gmhhope commented 2 years ago

Continuing with this thread:

I am trying to perform some tests on applying the function. I hope you can bear with me if I did some stupid mistakes.

read the raw data

raw_data <- readRDS("../../output/targetedMz_FeatExtract/HILICpos_authStd.RDS")

Create a data frame of ROI

#peak detection with ROI
ms1Tbl <- data.frame(mz = c(205.1073,406),
                     rtMin = c(0,300), 
                     rtMax = c(0,300),
                     length = c(30,30),
                     intensity = c(10000,10000)
                     )

convert the ROI dataframe to a roilist

Tbl2roilist <- function(ms1Tbl,xcms_obj, mz_tolerance) {
  roilist = list()
  #` ms1Tbl requires mz, rtMin, rtMax, length and intensity
  scanRt <- rtime(filterFile(raw_data, file =1))
  ms1Tbl$scanMin <- sapply(ms1Tbl$rtMin, function(x)which.min(abs(scanRt-x)))
  ms1Tbl$scanMax <- sapply(ms1Tbl$rtMax, function(x)which.min(abs(scanRt-x)))
  ms1Tbl <- ms1Tbl[,c("mz","scanMin","scanMax","length","intensity")]
  for (i in (1:dim(ms1Tbl)[1])) {
    roilist[[i]] <- c(scmin = ms1Tbl$scanMin[i], 
                      scmax = ms1Tbl$scanMax[i],
                      mzmin = ms1Tbl$mz[i]-0.005,
                      mzmax = ms1Tbl$mz[i]+0.005,
                      length = ms1Tbl$length[i],
                      intensity = ms1Tbl$intensity[i])
  }
  return(roilist)
}

roilist <- Tbl2roilist(ms1Tbl,raw_data)

Perform peak detection

prefilter_sample_num = 1
cwp <- CentWaveParam(peakwidth = c(4, 30), noise = 5000, ppm = 1, roiList = roilist, prefilter = c(prefilter_sample_num, 5000)) # 12 peaks and intensity >= 5000
xdata <- findChromPeaks(raw_data, param = cwp)

It delivers the following error, which doesn't appear if I don't provide roilist parameter:

Detecting chromatographic peaks in 2 regions of interest ...
Error: BiocParallel errors
  14 remote errors, element index: 1, 4, 6, 8, 10, 12, ...
  16 unevaluated and other errors
  first remote error: $ operator is invalid for atomic vectors

Could you explain what's going on? And how to fix this? I also attach my raw_data (OnDiskMSnExp) here. HILICpos_authStd.RDS.zip

Thanks very much!

Best, Minghao Gong

stanstrup commented 2 years ago

As far as I remember digging around the code the length and intensity is not used. I set both to -1 as I believe the code also does somewhere when it is done automatically.

You can get the scanindexes with scanIndex(OnDiskMSnExp).

It is not possible to run your script without the raw data. Your function also refers to raw_data but that is not in the function input.

My function to do the same conversion is here: https://github.com/stanstrup/QC4Metabolomics/blob/880829d7a3fa9ea611c52321e86da92312b30180/MetabolomiQCsR/R/standard_stats.R

Note that it was made for the old style raw object so the scantime needs to be another way.

sneumann commented 2 years ago

It is always good to use/test with one of the files in msdata or mtbls2 packages for a cut&pastable example and to make sure it is not an issue with the data files at hand. Yours, Steffen

gmhhope commented 2 years ago

Thanks Jan for helping out!

As far as I remember digging around the code the length and intensity is not used. I set both to -1 as I believe the code also does somewhere when it is done automatically.

You can get the scanindexes with scanIndex(OnDiskMSnExp).

It is not possible to run your script without the raw data. Your function also refers to raw_data but that is not in the function input.

My function to do the same conversion is here: https://github.com/stanstrup/QC4Metabolomics/blob/880829d7a3fa9ea611c52321e86da92312b30180/MetabolomiQCsR/R/standard_stats.R

Note that it was made for the old style raw object so the scantime needs to be another way.

I will check out the function you attached. My raw_data is attached at the end of the issue (it may be probably too long that it is hard to find that. Sorry about that! I will attach it again to overcome the inconvenience! HILICpos_authStd.RDS.zip

Thanks very much for helping out!

Best, Minghao Gong