sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
189 stars 80 forks source link

Analyze MRM data #415

Open rromoli opened 5 years ago

rromoli commented 5 years ago

I'm working with MRM data aquired on a Waters instrument and exported into a .mzML format. I try to import them using xcms using the readMSData() function with no luck:

fls <- list.files(path = "quattroMicro/", pattern = ".mzML$",
                  full.names = TRUE)

data <- readMSData(fls, mode = "onDisk")
> chr <- chromatogram(data)
Warning message:
In .extractMultipleChromatograms(object, rt = rt, mz = mz, aggregationFun = aggregationFun,  :
  No MS 1 data present.
> chr
Chromatograms with 0 rows and 18 columns
phenoData with 0 variables
featureData with 0 variables
> chr <- chromatogram(data, mz = 823)
Warning message:
In .extractMultipleChromatograms(object, rt = rt, mz = mz, aggregationFun = aggregationFun,  :
  No MS 1 data present.
>

Seems that readMSData() is not able to correctly import MRM data. I read on MSnbase manual about the readSRMData() function to import MRM/SRM data. It seems to work and correctly import my data:

## import data
mrm <- readSRMData(fls)
> mrm
Chromatograms with 8 rows and 18 columns
                  1              2              3              4              5
     <Chromatogram> <Chromatogram> <Chromatogram> <Chromatogram> <Chromatogram>
[1,]    length: 495    length: 495    length: 495    length: 495    length: 495
[2,]    length: 495    length: 495    length: 495    length: 495    length: 495
...            ...            ...            ...            ...            ... 
[7,]    length: 284    length: 284    length: 284    length: 284    length: 284
[8,]    length: 284    length: 284    length: 284    length: 284    length: 284

phenoData with 1 variables
featureData with 10 variables

I would like to manipulate (integrate, align) data using xcms but it seems readSRMData() class is not compatible with xcms functions:

chrs <- chromatograms(mrm)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘chromatograms’ for signature
  ‘"Chromatograms"’

> class(mrm)
[1] "Chromatograms"
attr(,"package")
[1] "MSnbase"
> class(data)
[1] "OnDiskMSnExp"
attr(,"package")
[1] "MSnbase"

Is there a way to work with readSRMData() with xcms()?

jorainer commented 5 years ago

You are on the right track @rromoli . The readSRMData actually returns a Chromatograms object (the same type of object you would get by calling chromatogram on an MSnExp/OnDiskMSnExp object containing spectra data).

You should be able to directly call findChromPeaks on the mrm object you have. This will return you a XChromatograms object (defined in xcms) that contains then also the identified chromatographic peaks (which you can access with chromPeaks). Note that I've also implemented a groupChromPeaks method for XChromatgrams, but no adjustRtime method.

Regarding alignment, I've implemented only an alignment method that allows to align a single Chromatogram object against another one - but nothing yet for Chromatograms (note the s).

rromoli commented 5 years ago

findChromPeaks() works fine using MatchedFilterParam(), but I noticed that the fwhm parameter have a strange behaviour. I need to divide the value 10 times. So the value I used is fwhm/10 (0.4) otherwise it integrate too much base line...

Furthermore I do not understand how to extract data. I mean:

> featureValues(peaks, value = "into")
             1         2        3         4         5         6         7
FT01        NA        NA       NA 271.50840 225.47668 201.51727 862.14525
FT02        NA        NA       NA        NA        NA        NA 215.32279
FT03 140.49033 107.46131 87.04114 143.76471 118.99295 113.02374 186.81566
FT04  53.75822  48.75521 42.22618  58.85781  55.24559  46.26336 109.39015
FT05        NA        NA       NA  30.84873        NA        NA  58.27840
FT06        NA        NA       NA        NA        NA        NA        NA
FT07        NA        NA       NA  46.28275  68.67077  63.43596 143.92853
FT08        NA        NA       NA        NA        NA        NA        NA
FT09        NA        NA       NA        NA        NA        NA        NA
FT10        NA        NA       NA        NA        NA        NA  68.55487

In this way I extract the integrated signals but I have no idea what FTXX stand for.

If I use the precursorMz() and productMz() functions I see that I have 8 SRM transitions in my dataset. Why in the results I have 10 features? I try to use featureDefinitions()

> featureDefinitions(peaks)
DataFrame with 10 rows and 15 columns
         mzmed     mzmin     mzmax            rtmed            rtmin
     <numeric> <numeric> <numeric>        <numeric>        <numeric>
FT01        NA        NA        NA 1.64795005321503 1.62013328075409
FT02        NA        NA        NA 1.66193330287933 1.64795005321503
FT03        NA        NA        NA 4.00483322143555 3.76771664619446
FT04        NA        NA        NA 11.7689828872681 11.7496662139893
FT05        NA        NA        NA 9.60551643371582 9.58619976043701
FT06        NA        NA        NA 10.1463832855225 10.1463832855225
FT07        NA        NA        NA 9.60551643371582 9.58619976043701
FT08        NA        NA        NA 10.1463832855225 10.1463832855225
FT09        NA        NA        NA 10.1463832855225 10.1270666122437
FT10        NA        NA        NA 10.1270666122437 10.1270666122437

but the function return no mz values.

How can I interpret the results?

jorainer commented 5 years ago

Actually, you're the first user of this functionality! I've never analyzed MRM data (or had any MRM files available for testing). The FTXX is just an arbitrary feature identifier. The whole functionality works in a similar way as if you had LC-MS data, it does first chromatographic peak detection separately for each chromatogram (MRM) and then it uses the chromPeaks matrix to group peaks across samples. I could imagine that you have more features than MRM because maybe in some of the chromatograms more than one peak was identified?

would it be possible for you to share some files with me so that I could look into what's happening?

rromoli commented 5 years ago

I could imagine that you have more features than MRM because maybe in some of the chromatograms more than one peak was identified?

Yes, it seem that I have two interfering ions...

would it be possible for you to share some files with me so that I could look into what's happening?

Yes of course, how can we share? If you give to me your email I will share it with gdrive.

jorainer commented 5 years ago

Thanks for the data! To get the information about the transision for the individual features you can do the following (variable peaks is your Chromatograms object after peak detection and correspondence analysis):

fdev <- featureDefinitions(peaks)
fdev <- fdev[, colnames(fdev) != "peakidx"]
fdev
DataFrame with 10 rows and 14 columns
         mzmed     mzmin     mzmax            rtmed            rtmin
     <numeric> <numeric> <numeric>        <numeric>        <numeric>
FT01        NA        NA        NA 1.64795005321503 1.62013328075409
FT02        NA        NA        NA 1.66193330287933 1.64795005321503
...        ...       ...       ...              ...              ...
FT09        NA        NA        NA 10.1463832855225 10.1270666122437
FT10        NA        NA        NA 10.1270666122437 10.1270666122437
                rtmax    npeaks        P0        P1        P2        P3
            <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
FT01 1.67591667175293        15         0         3         3         3
FT02 1.68990004062653        12         0         0         3         3
...               ...       ...       ...       ...       ...       ...
FT09 10.1463832855225         7         0         0         1         3
FT10 10.1463832855225        12         0         0         3         3
            P4        P5       row
     <numeric> <numeric> <integer>
FT01         3         3         1
FT02         3         3         2
...        ...       ...       ...
FT09         0         3         7
FT10         3         3         8

In the featureDefinitions there is a column "row" that tells you in which of the rows (transitions) the feature was defined. You can add the actual precursor and product m/z with:

## Add the precursorMz and productMz to the annotation.
fdev$precursorMz <- rowMeans(precursorMz(peaks))[fdev$row]
fdev$productMz <- rowMeans(productMz(peaks))[fdev$row]

And to get the feature intensities:

fvals <- featureValues(peaks, value = "into")

Each row in fdev provides now the feature annotations for the corresponding row in fvals.

Hope it is a little clearer now. Let me know if not.

sneumann commented 5 years ago

Hi @rromoli , if Johannes' suggestion works for you, it would be great if you could turn that into an MRM vignette. For that we'll need representative data (but could also be measurements of QC samples, no science required), and the script plus some explanations with it. Would that make sense ? Yours, Steffen

rromoli commented 5 years ago

Ok @sneumann I will try to write a vignette about the use of xcms with MRM data!

YANGJJ93research commented 3 years ago

Ok @sneumann I will try to write a vignette about the use of xcms with MRM data!

Hi, @rromoli may I know if you solve the mrm data import issue by now?

YANGJJ93research commented 3 years ago

Thanks for the data! To get the information about the transision for the individual features you can do the following (variable peaks is your Chromatograms object after peak detection and correspondence analysis):

fdev <- featureDefinitions(peaks)
fdev <- fdev[, colnames(fdev) != "peakidx"]
fdev
DataFrame with 10 rows and 14 columns
         mzmed     mzmin     mzmax            rtmed            rtmin
     <numeric> <numeric> <numeric>        <numeric>        <numeric>
FT01        NA        NA        NA 1.64795005321503 1.62013328075409
FT02        NA        NA        NA 1.66193330287933 1.64795005321503
...        ...       ...       ...              ...              ...
FT09        NA        NA        NA 10.1463832855225 10.1270666122437
FT10        NA        NA        NA 10.1270666122437 10.1270666122437
                rtmax    npeaks        P0        P1        P2        P3
            <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
FT01 1.67591667175293        15         0         3         3         3
FT02 1.68990004062653        12         0         0         3         3
...               ...       ...       ...       ...       ...       ...
FT09 10.1463832855225         7         0         0         1         3
FT10 10.1463832855225        12         0         0         3         3
            P4        P5       row
     <numeric> <numeric> <integer>
FT01         3         3         1
FT02         3         3         2
...        ...       ...       ...
FT09         0         3         7
FT10         3         3         8

In the featureDefinitions there is a column "row" that tells you in which of the rows (transitions) the feature was defined. You can add the actual precursor and product m/z with:

## Add the precursorMz and productMz to the annotation.
fdev$precursorMz <- rowMeans(precursorMz(peaks))[fdev$row]
fdev$productMz <- rowMeans(productMz(peaks))[fdev$row]

And to get the feature intensities:

fvals <- featureValues(peaks, value = "into")

Each row in fdev provides now the feature annotations for the corresponding row in fvals.

Hope it is a little clearer now. Let me know if not.

Hi @jorainer , I wonder if there are mrm data processing functions inside xcms now?

jorainer commented 3 years ago

There's nothing specifically for MRM data, except that you can read the data as a MChromatograms object and then perform chromatographic peak detection in each chromatogram (using the findChromPeaks function), you can also perform a correspondence analysis (using groupChromPeaks). In addition there is functionality to filter, plot and subset the chromatographic data.

YANGJJ93research commented 3 years ago

There's nothing specifically for MRM data, except that you can read the data as a MChromatograms object and then perform chromatographic peak detection in each chromatogram (using the findChromPeaks function), you can also perform a correspondence analysis (using groupChromPeaks). In addition there is functionality to filter, plot and subset the chromatographic data.

@jorainer @rromoli @sneumann hi, I am trying to exporting MRM data from waters QQQ. However, the mzML file converted by MSconvert is not readable by readSRMdata(). Is there any way to work around this problem? please kindly find my error message.

Original error was: Error in pwizModule$open(filename): [IO::HandlerBinaryDataArray] Missing binary data type.

Best regards, Junjie

jorainer commented 3 years ago

That is actually a problem with mzR and more recent versions of proteowizard. Maybe try with the suggestions from this issue https://github.com/lgatto/MSnbase/issues/551 . In the longer run we hope to manage updating mzR to include a newer version of proteowizard, but at present the workaround is to skip some data in the msconvert conversion to mzML files.

YANGJJ93research commented 3 years ago

Hi, thanks a lot for your reply. I tried with the command line, not working tho. please kindly see the command line and the r output.

D:\proteowizard>msconvert test.RAW --chromatogramFilter "index [2,]" format: mzML m/z: Compression-None, 64-bit intensity: Compression-None, 32-bit rt: Compression-None, 64-bit ByteOrder_LittleEndian indexed="true" outputPath: . extension: .mzML contactFilename: runIndexSet:

spectrum list filters:

chromatogram list filters: index [2,]

filenames: test.raw

processing file: test.raw calculating source file checksums writing output file: .\test.mzML


mrm <- readSRMData(fls2) Error: Can not open file D:\zhengjie_project\MRM_pipline\test.mzML! Original error was: Error in pwizModule$open(filename): [IO::HandlerBinaryDataArray] Unknown binary data type. mrm_cmd <- readMSData(fls2) Error: Can not open file D:\zhengjie_project\MRM_pipline\test.mzML! Original error was: Error in pwizModule$open(filename): [IO::HandlerBinaryDataArray] Unknown binary data type.


and this is my converted .mzMLfile. test.zip

Regards, Junjie

jorainer commented 3 years ago

Seems that the converted file only contains a single chromatogram entry - which is the TIC (with the "non-standard data array" in it) - I guess the original file contains more chromatograms?

We'll try to update the mzR package to include the new proteowizard code base - that should solve all problems but I can not guarantee when it will be available.

YANGJJ93research commented 3 years ago

@jorainer yes. The original ".raw" file contains several MRM transitions. Please kindly find the txt file converted by the msconvert. I can find the chromatograms inside the text. so I am wondering if we could find a workaround using this text format.

test.txt

jorainer commented 3 years ago

The developmental mzR version with an updated proteowizard code is available. With this version it should be possible to read the mzML files. It might take some time until this version becomes "stable" because we had to remove the ramp backend and hence mzData support. To install:

BiocManager::install("sneumann/mzR", ref = "feature/updatePwiz_3_0_21263")
YANGJJ93research commented 3 years ago

Noted with thanks!

YANGJJ93research commented 3 years ago

The developmental mzR version with an updated proteowizard code is available. With this version it should be possible to read the mzML files. It might take some time until this version becomes "stable" because we had to remove the ramp backend and hence mzData support. To install:

BiocManager::install("sneumann/mzR", ref = "feature/updatePwiz_3_0_21263")

@jorainer hi, I also curious about how to achieve the peak alignments for the mchromatograms object successfully. My mrm data was imported by readSRMData, which resulted in Mchromatograms format. Therefore, I was not able to do the alignments for my data.

jorainer commented 3 years ago

There is no alignment method as we have for XCMSnExp (i.e. spectra data) available for the chromatographic data. What is available is the findChromPeaks method that allows to identify chromatographic peaks and then also the groupChromPeaks method to group chromatographic peaks across samples (have a look a the XChromatograms help for more details ?XChromatograms).

The only alignment method which is available for MChromatograms is alignRt which allows to align an MChromatograms (i.e. chromatographic data across multiple samples) against a single Chromatogram object. But I'm not sure if that's what you're looking for.

YANGJJ93research commented 3 years ago

Hi Jorainer, I found some issues after I made peak picking on the chromatogram object of MRM data read by readSRMData.

  1. peaks(y) after alignment function alignRT ended up as the copy chromatogram of example chromatogram(x)
  2. findChrompeak function with "MatchedFilterParam" was not able to detect peaks correctly on my data and failed to pick up two peaks in one chromatogram object.

Please kindly find my example data herein. E4-1.zip

jorainer commented 3 years ago

Could you please add here also the R code you used to perform this analysis. Without that it's impossible to replicate and find out what your problems might be.

YANGJJ93research commented 3 years ago

Hi, thanks a lot for your reply! Please kindly find the attached code herein:

std <- "E4-1.mzML" std1 <- readSRMData(std) chr1 <- std1[1,] mfp <- MatchedFilterParam( binSize = 0.1, snthresh = 0, ) xchr1 <- findChromPeaks(chr1, mfp)

YANGJJ93research commented 3 years ago

Hi Jorainer, I found some issues after I made peak picking on the chromatogram object of MRM data read by readSRMData.

  1. peaks(y) after alignment function alignRT ended up as the copy chromatogram of example chromatogram(x)
  2. findChrompeak function with "MatchedFilterParam" was not able to detect peaks correctly on my data and failed to pick up two peaks in one chromatogram object.

Please kindly find my example data herein. E4-1.zip

I find out the way to pick up small side picks by adjusting the fwhm value (from 0~5) for my first questions. I am still trying to find out a good way to solve the second question.

jorainer commented 3 years ago

Since the peaks are quite different (the first one broader the second quite narrow) I would suggest to use centWave instead of matchedFilter:

cwp <- CentWaveParam(peakwidth = c(1, 4))
tmp <- findChromPeaks(chr1, param = cwp)
plot(tmp)

this identifies both peaks:

Untitled
YANGJJ93research commented 3 years ago

Thanks a lot!!

YANGJJ93research commented 3 years ago

Hi @jorainer , regarding my first question about the retention time correction. May I know if there is any way to get a modified function for retention time correction for mrm data?

jorainer commented 3 years ago

At present we don't have a dedicated function to do a retention time alignment on MRM data (similar to what is available for spectra-based LC-MS data). For chromatograms with a single peak it should in theory also suffice to use a rather large bw parameter in groupChromPeaks with PeakDensityParam which will then also group chromatographic peaks into the same feature even if their retention times are different.

We might implement some functionality, but at present we unfortunately don't have the capacity/manpower to do that. What would however help later is to get hands on example MRM data files with peaks that need to be aligned...

YANGJJ93research commented 3 years ago

My chromatograms come with multiple peaks. I wish to make an alignment across samples before I group any peaks and continue with the downstream analysis. Currently, I try to find a workaround for this issue. Thanks for your help too!

breidan commented 1 year ago

Hi all, I want to share my experience with SRM data and xcms: An assay on a QqQ creates SRM data with 30 transitions. Two of them detect two isobaric, closely eluting compounds. The attached ZIP file contains a RDS file of those two transitions as MChromatograms. If I plot this I get: graph.pdf

Then I do xdata<-findChromPeaks(srm_selected[8,6], param = cwp)

and

chromPeaks(xdata) rt rtmin rtmax into intb maxo sn [1,] 7.302067 6.424583 8.231167 126.534 31.3895 207.1527 32 [2,] 14.941333 13.547683 15.818817 3968.052 3849.5452 14944.2097 5002

shows that the two large peaks have been detected as one wide peak at 14.94 min.

Doing the peak detection with MatchedFilterParam shows the same behaviour. I've tried around but can not find settings for either that would detect the two peaks as individuals.

Now if I use do_findPeaks_MSW I get both peaks as individuals:

int<-intensity(srm[2,]) rt<-rtime(srm[2,]) do_findPeaks_MSW(rt,int,snthresh = 1,scales=1:10) mz mzmin mzmax rt rtmin rtmax into maxo sn intf maxf [1,] 14.27032 14.11547 14.37355 -1 -1 -1 28765.37 12087.74 35.86637 NA 12216.63 [2,] 14.94133 14.78648 15.04457 -1 -1 -1 40612.61 14944.21 49.98661 NA 17026.19

Peak apex and boundaries are well enough defined.

I am wondering now: MSW and centwave both use the MassSpecWavelet functionalities. Why are they delivering such different results. Using findChromPeaks with centwave would be so much more comfortable on MChromatograms but I think I can do with do_findPeaks_MSW.

Cheers Andreas

srm.zip

breidan commented 1 year ago

Forget what I wrote above. Reading through some other issues I realized that my SRM data is loaded with rtime in minutes. Thus using peakwidth(cwp)<-c(1,10) is way too large. With peakwidth(cwp)<-c(0.017,0.17) I do get individual peak detection.

My bad :-)

breidan commented 1 year ago

Of course, it would help if readSRMData and readMSData would behave identical in scaling the run time axis. Reading the same mzML files readSRMData puts out minutes and readMSData seconds. For whom is this an issue: @jorainer or @lgatto ?

jorainer commented 1 year ago

That's interesting @breidan , I was not aware that you get different units from readSRMData or readMSData - would it be possible to provide one example file? in the end this should go to mzR because we're using mzR (which uses protepwizard) to read mzML files.

breidan commented 1 year ago

@jorainer, attached is a zip of a mzML file of a SRM acquisition on an Agilent QqQ. This is one of the files that were read in for the MChromatograms object in the zip file above. Rtime scale seconds with readMSData and minutes with readSRMData.

Day 1 Cal 5.zip

jorainer commented 1 year ago

Thanks for sharing the file. So, you're right. the time is provided in minutes within the file and for the retention time of the spectra the mzR/proteowizard C++ code is converting any provided time into seconds. For the chromatographic data I could not find a way to easily identify in which unit the retention time is provided and how that can be automatically converted to seconds (if not already provided as seconds). Any help on this (in the mzR package) would be highly welcome...