adjustRtime() Error - Githubissues

zinuo-H commented 1 year ago

I have successfully processed datasets from various vendors of LC-MS/MS instruments with xcms' R packages. However, this time I failed to process the LC-MS/MS data obtain from sciex X500B. The error occur when processing adjustRtime() function. If I skip this step, everything seems goes well.

I think there may be some problem with my importing mzML files (convert from .wiff format by MSConvert). If I only choose serval of mzML file import (e.g. only import QC files), the project contain adjustRtime() function could run successfully. What's puzzling is that when I add different mzML files, I also get different error reports.

The original code I used:

library(xcms) #version 3.16.1
library(faahKO)
library(IPO)
library(Spectra)
library(xcms.gnps.tools)

library(pander)
library(RColorBrewer)
library(mzR)
library(msdata)
library(xcms)
library(faahKO)
library(pheatmap)
library(magrittr)
library(ChemoSpec)
library(speaq)
library(BiocParallel)

### Data import

order <- c(
  "RJK_MEOH",
  "E_MEOH",
  "QC1",
  "20",
  "23",
  "24",
  "chao",
  "617",
  "lye",
  "QC2"
)

mzfs <- paste0("~/mzML/20230316Pi_wiff/NEG/", 
               order, 
               sep = ".mzML"
)

s_groups <- c(rep("Blank",2),"QC",rep("Blank",3),rep("Sample",3),"QC")

pd <- data.frame(sample_name = sub(basename(mzfs), pattern = ".mzML",
                                   replacement = "", fixed = TRUE),
                 sample_group = s_groups, 
                 stringsAsFactors = FALSE)

raw_data <- readMSData(files = mzfs, pdata = new("NAnnotatedDataFrame", pd),
                       mode = "onDisk")

### Peak picking
paramPeakPick <- CentWaveParam(snthresh = 10, noise = 300, peakwidth = c(4, 50),ppm=10)
xdata <- findChromPeaks(raw_data,param = paramPeakPick)

### Retention time alignment
xdata <- adjustRtime(xdata, param = ObiwarpParam())

### Peak grouping

pdp <- PeakDensityParam(sampleGroups = xdata$sample_group,
                        minFraction = 0.10)
xdata <- groupChromPeaks(xdata, param = pdp) 

### Gap filling
xdata <- fillChromPeaks(xdata, param = ChromPeakAreaParam())

Then I got the first error:

# Linux platfrom using xcms 3.16.1
Sample number 3 used as center sample.
Aligning E_MEOH.mzML against QC1.mzML ... Found gaps in scan time of file E_MEOH.mzML: cut scantime-vector at 17.343 seconds.

Error: BiocParallel errors
  2 remote errors, element index: 1, 2
  0 unevaluated and other errors
  first remote error: argument of length 0
In addition: Warning message:
stop worker failed:
  wrong args for environment subassignment 
Execution halted

#macos platfrom using xcms 3.20.0
Sample number 5 used as center sample.
Error in reducer$value.cache[[as.character(idx)]] <- values : 
  wrong args for environment subassignment
In addition: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :
  1 parallel job did not deliver a result

It seems like this error related with BiocParallel packages. So I add register(SerialParam()) command before xdata <- adjustRtime(xdata, param = ObiwarpParam()) then try agian, however this time the error is different from the previous one:

Sample number 3 used as center sample.
Aligning RJK_MEOH.mzML against QC1.mzML ... Found gaps in scan time of file RJK_MEOH.mzML: cut scantime-vector at 25.881 seconds.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/var/spool/slurm/d/job1202980/slurm_script: line 12: 83052 Aborted                 (core dumped) Rscript 20230316_NEG.R

The std::bad_alloc seems error indicates that there was a memory allocation problem while running the R script. However, I am sure that I have allocated enough memory for this project (Memory Efficiency: 0.61% of 2.00 TB ). (This task cannot be run on a PC because it will cause the R session to close directly)

As I mention before, this full task can run successfully if I only import QC file. I tried to run the task after deleting the first two files and got a new error:

# new script

### Load data
order <- c(
  "QC1",
  "20",
  "23",
  "24",
  "chao",
  "617",
  "lye",
  "QC2"
)

mzfs <- paste0("/Users/huangzinuo/Documents/QTexposome/pre_experiment/mzML/20230316Pi_wiff/NEG/", #最后一定要加/ ！！
               order, 
               sep = ".mzML"
)

s_groups <- c("QC",rep("Blank",3),rep("Sample",3),"QC")

pd <- data.frame(sample_name = sub(basename(mzfs), pattern = ".mzML",
                                   replacement = "", fixed = TRUE),
                 sample_group = s_groups, 
                 stringsAsFactors = FALSE)

raw_data <- readMSData(files = mzfs, pdata = new("NAnnotatedDataFrame", pd),
                       mode = "onDisk")

### Peak picking
paramPeakPick <- CentWaveParam(snthresh = 10, noise = 300, peakwidth = c(4, 50),ppm=10)
xdata <- findChromPeaks(raw_data,param = paramPeakPick)

### Retention time alignment
register(SerialParam())
xdata <- adjustRtime(xdata, param = ObiwarpParam())

# error:
Sample number 4 used as center sample.
Aligning QC1.mzML against 24.mzML ... OK
Aligning 20.mzML against 24.mzML ... OK
Aligning 23.mzML against 24.mzML ... OK
Aligning chao.mzML against 24.mzML ... Found gaps in scan time of file chao.mzML: cut scantime-vector at 7.002 seconds.
Error: BiocParallel errors
  1 remote errors, element index: 4
  3 unevaluated and other errors
  first remote error:
Error in start_idx:end_idx: argument of length 0
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf

See, some of the files succeeded, the mzML file named 'chao' failed, and a new error appeared!

I don't know what's happening and where the error is coming from, is it a problem with my files? Is it an error when generating the mzML file? The original mzML files is as follows:

Link：https://drive.google.com/drive/folders/1_q6M5aQzxX0HrBnjeBM-tJcZse3vIoUR?usp=share_link

I hope I can get an answer, thanks!

jorainer commented 1 year ago

Thanks for this detailed explanation and description - I try to answer as much as possible, as there are several things here:

it is generally suggested to use register(SerialParam()) to fix errors. With parallel processing the error messages can be (also depending on the operating system) misleading (because the error message might be hidden in between some other error messages coming from the parallel processing). This is why it seems that you get different errors.
the error "Found gaps in scan time" means that for that particular mzML file the difference in retention times between consecutive spectra is larger than expected/tolerated. And the std::bad_alloc message comes directly from the C++ code of obiwarp. So, something in the C++ code of obiwarp fails - and unfortunately that is not something I can fix.
It seems you have blanks and study samples included. Maybe a subset-based alignment might be better in your case? See this section for more information. Generally, if you have QC samples measured at regular intervals in your experiment, this type of alignment might be better suited, as retention time shifts are estimated based on the QC samples (same sample measured repeatedly, so the same peaks/signal can be expected) and the samples measured in between the QC samples are then adjusted based on these.
Finally, maybe try to use the peak groups alignment (PeaksGroupsParam) instead of obiwarp for these specific samples/files? I found that method to be more robust than obiwarp - and configuration is also easier and more intuitive.
Also, what version of xcms did you use for your analysis? Some errors/problems with obiwarp were fixed, somaybe updating the version might also help.

I hope this reply can help you finding a solution for your problem.

zinuo-H commented 1 year ago

Thank you so much for the speedy reply! Your answer help me a lot！

I try to use the peak groups alignment (PeaksGroupsParam) instead of obiwarp for this dataset, then run successfully! But now I have a new question about these two alignment algorithm here: Perhaps you noticed that the files I asked about earlier were collected in negative mode. However, when running these samples' positive mode data, there are some problems as well：The positive mode data were able to run successfully using the Obiwarp alignment algorithm, but ended up with only 65 FEATURES in the output table (This seems very unreasonable! These samples perform very well in the TIC plot and should not have detected only so few peaks). And this time, I try to use PeaksGroupsParam instead of obiwarp for these positive mode data files, the new output table contains 3669 FEATURES. I wonder why just modifying the peak alignment algorithm caused such a big difference in the results? (65 FEATURES to 3669 FEATURES). This confuses me and I don't know how to choose the appropriate peak alignment algorithm in the future (obiwarp seems to be a more commonly used algorithm). I'd like to hear your thoughts and advice on this situation!

Again, thank you and your answer made my day!

other information

xcms version: ‘3.20.0’

The original mzML files link： https://drive.google.com/drive/folders/1_q6M5aQzxX0HrBnjeBM-tJcZse3vIoUR?usp=share_link Updated script:


### Peak picking
paramPeakPick <- CentWaveParam(snthresh = 10, noise = 300, peakwidth = c(4, 50),ppm=10)
xdata <- findChromPeaks(raw_data,param = paramPeakPick)

Retention time alignment

pdp <- PeakDensityParam(sampleGroups = xdata$sample_group, bw = 0.25, minFraction = 0.77, minSamples = 1, maxFeatures = 50)

xdata <- groupChromPeaks(ipo_xdata,param = pdp)

pgp <- PeakGroupsParam( minFraction = 1 ) xdata <- adjustRtime(xdata, param = pgp)

Peak grouping

pdp <- PeakDensityParam(sampleGroups = xdata$sample_group, minFraction = 0.10) xdata <- groupChromPeaks(xdata, param = pdp)

Gap filling

xdata <- fillChromPeaks(xdata, param = ChromPeakAreaParam())

feature table outputting

feature.info <- featureDefinitions(xdata) feature.intensity <- featureValues(xdata,method = "medret",value = "into",intensity = "into", filled = TRUE,missing = NA) feature.table <- merge(feature.info,feature.intensity,by = 0, all = TRUE, Row.names = TRUE) feature.table <- feature.table[, !(colnames(feature.table) %in% c("peakidx"))]

write.table(feature.table, "xcms_all.csv", sep = ",", quote = FALSE, row.names = FALSE)

jorainer commented 1 year ago

I would suggest you to visualize the alignment results of both algorithms using the plotAdjustedRtime function. Could well be that with one algorithm you have stronger adjustments then with the other. Here you need to decide what makes most sense for your experiment and data file: how strong do you think would the retention time shifts be in your data?

The alignment results should be somehow reasonable, otherwise you end up assigning signals from different compounds to the same feature...

jorainer commented 1 year ago

Maybe this tutorial could also give you some hints how you can check results and find settings/parameters.

zinuo-H commented 1 year ago

Thanks for your guidance! I try to use both plotAdjustedRtime and plot(chromatogram(xdata, aggregationFun = "max", include = "none")) functions to visualize the alignment results. Then I got the plot:

The raw base peak chromatogram:
Result of Obiwarp algorithm:
Result of PeakGroups algorithm:

It is obvious that the PeakGroups algorithm gives a more reasonable result! Thank you so much! My problem has been resolved, and your response has been a great help to me!

jorainer commented 1 year ago

Thanks for reporting back and closing the issue!

MuyaoXi9271 commented 1 year ago

Thanks for your guidance! I try to use both plotAdjustedRtime and plot(chromatogram(xdata, aggregationFun = "max", include = "none")) functions to visualize the alignment results. Then I got the plot:

The raw base peak chromatogram:

Result of Obiwarp algorithm:

Result of PeakGroups algorithm:

It is obvious that the PeakGroups algorithm gives a more reasonable result! Thank you so much! My problem has been resolved, and your response has been a great help to me!

Hi

May I ask how to plot the result of "PeakGroups" algorithm? Thanks very much.

Best, Muyao

jorainer commented 1 year ago

The BPC were extracted with the chromatogram(data, aggregationFun = "max", chromPeaks = "none") function (assuming the object with the xcms preprocessing result is called data), the results from the alignment step were plotted using the plotAdjustedRtime(data) function.

sneumann / xcms

adjustRtime() Error #668

other information

Retention time alignment

Peak grouping

Gap filling

feature table outputting