sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
177 stars 81 forks source link

Filter MsExperiment object by spectraData filterString #723

Closed breidan closed 3 months ago

breidan commented 4 months ago

Hi, I have just begun finding my way through xcms4 and MsExperiment. What I see so far is very nice. In particular the MsBackends open up new possibilities.

I have here a mzML file acquired on an Orbitrap Elite with two scan events per scan, one with SIM m/z 310-630, the other with SIM 620-940, and a compound which will be detected in the first scan event. I read in the file with msexp <- readMsExperiment(spectraFiles = files, source = MsBackendMzR(),sampleData = pd) and then extract a chromatogram with chr_raw<-chromatogram(msexp,mz=mzRange,rt=rtRange). But the chromatogram is empty even though the same injection solution with an acquisition with a single scan event shows a nice peak. I assume the reason is that the selected mass range comes up empty in the second scan event.

Is there a way to subset the MsExperiment object by the filterString from spectraData? Or any other way to address this properly?

I already apologize for my ignorance :-)

Cheers Andreas

jorainer commented 4 months ago

hm, yes, that's a very good point indeed! currently there is no easy way to filter the Spectra within an MsExperiment by one of its spectraVariables.

A workable solution would be to define a function that subsets a Spectra object based on a spectra variable matching a provided value and then pass that function to the filterSpectra function (that was recently added to MsExperiment devel version).

So, basically it would work like this:

#' Define a function to filter a Spectra based on a value of one of it's spectra variables
filterFilterString <- function(x, value) {
    x[x$filterString == value]
}

#' Use filterSpectra to subset the Spectra within an MsExperiment using that function
mse <- filterSpectra(mse, filter = filterFilterString, value = "A")

value = "A" would obviously needed to be adapted to the actual data, i.e. use the actual filter string to which you would like to subset the Spectra.

However, in order to use this functionality, you would need to update some of our packages to the devel version:

BiocManager::install("RforMassSpectrometry/ProtGenerics")
BiocManager::install("RforMassSpectrometry/MsCoreUtils")
BiocManager::install("RforMassSpectrometry/Spectra")
BiocManager::install("RforMassSpectrometry/MsExperiment")
breidan commented 4 months ago

@jorainer thanks for the reply. Just for clarification: the BiocManager::install(" command installs the lasted update for my R version and not the development versions, right? Because executing your list above results in a newer package for ProtGenerics and filterSpectra is now available but without a method for MsExperiment. Trying to install the development versions fails because I am running R4.3.2. And that cannot be changed because of ICT policies.

breidan commented 4 months ago

@jorainer, I did a bit of hacking and came up with this:

setGeneric("filterFilterString", function(object, ...) standardGeneric("filterFilterString"))
setMethod("filterFilterString",
    signature(object = "MsExperiment"),
    function (object,scanFilter=character(), ...) {
      ### get filterString and dataOrigin
      sD<-spectraData(spectra(object),columns=c("filterString","dataOrigin"))
      sD_incl<-grepl(scanFilter,sD[["filterString"]],fixed=TRUE)
      fS<-unique(sD[sD_incl,"filterString"])
      # dO<-unique(sD[sD_incl,"dataOrigin"])

      ls <- length(spectra(object))
      have_links <- length(object@sampleDataLinks[["spectra"]]) > 0
      if (have_links) 
        object@spectra$._SPECTRA_IDX <- seq_len(ls)

      object@spectra <- object@spectra[object@spectra$filterString==fS]
      if (have_links) {
        if (ls != length(spectra(object))){
          object <- xcms:::.update_sample_data_links_spectra(object)
          object<-object[unique(object@sampleDataLinks[["spectra"]][,1])]
        }
        svs <- unique(c(spectraVariables(spectra(object)), "mz", "intensity"))
        object@spectra <- selectSpectraVariables(object@spectra, svs[svs != 
                                                             "._SPECTRA_IDX"])
      }
      object
    }
)

This is based on the .mse_filter_spectra function adapted to my needs. Because not every scan filter appears in every file I also subsetted the MsExperiment object towards the end. Without that I was getting errors all the time.

With this method I do get now chromatograms when the selected mz is within the mass range of the scan event and an Empty MChromatogram if it is not.

Will continue with this until the filterSpectra has made it to the released version.

jorainer commented 4 months ago

The BiocManager::install calls from above, because they contain "RforMassSpectrometry/" will install the versions from github - it's essentially the same than remotes::install_github("RforMassSpectrometry/MsExperiment"). And it's strange that you don't have the filterSpectra,MsExperiment available - I installed MsExperiment that way and I have a method for MsExperiment:

> library(MsExperiment)
> showMethods("filterSpectra")
Function: filterSpectra (package ProtGenerics)
object="MsExperiment", filter="function"

> packageVersion("MsExperiment")
[1] ‘1.5.4’
> 

your solution is OK, but using filterSpectra would be just a bit cleaner.

breidan commented 4 months ago

@jorainer , as I said I apologize for my ignorance :-) I repeated the install and now it worked out and I have the filterSpectra,MsExperiment available now. Thanks.

findChromPeaks-centWave also has problems with the two scan events in the file. I can do peak finding on the filtered object but then I have two objects with chromPeaks wich I somehow have to combine then. Any simple way to combine them again into a XcmsExperiment?

breidan commented 3 months ago

I hate to be a pain, but ... In my work flow I first create a MsExperiment object with

> msexp<-readMsExperiment(spectraFiles=files,source=MsBackendRawFileReader::MsBackendRawFileReader())

to obtain spectraData which are lost when going through mzML with msconvert. But because the MsBackendRawFileReader is very slow with the peaksData I continue with a MsExperiment object created with MsBackendMzR for peak finding etc.

Now, with MsExperiment 1.4.0, MsCoreUtils 1.14.1, ProtGenerics 1.35.2, Spectra 1.12.0, and MsBackendRawFileReader 1.8.1 this works just fine.

But after updating to the github versions MsExperiment 1.5.4, MsCoreUtils 1.15.4, ProtGenerics 1.35.3, and Spectra 1.13.5 (as described above for filterSpectra,MsExperiment) this error is thrown:

> msexp<-readMsExperiment(spectraFiles = files,source=MsBackendRawFileReader::MsBackendRawFileReader())
Error in .local(object, ...) : 'data' has to be a 'DataFrame'

The cause is the call to Spectra in readMsExperiment. Changing the argument source= to backend= leads to an error with BiocParallel:

> msexp<-readMsExperiment(spectraFiles = files,backend=MsBackendRawFileReader::MsBackendRawFileReader())
Error: BiocParallel errors
  2 remote errors, element index: 1, 2
  0 unevaluated and other errors
  first remote error:
Error in substr(readBin(x, character(), n = 1), 1, 3): invalid multibyte string at '<a1>F'

Any ideas on this? Apparently, there was a change in Specta which is not compatible with MsBackendRawFileReader 1.8.1. And there is not much activity on the github channel of MsBackendRawFileReader.

jorainer commented 3 months ago

I'm actually not sure if this is an error from Spectra or from MsExperiment... let's try to find the bug:

can you please try to create the backend (also, please always load a library instead of calling a function through :: - just to make sure that all methods, generics etc are properly loaded):

library(MsBackendRawFileReader)
be <- backendInitialize(MsBackendRawFileReader(), files)

If that works, next step: create a Spectra:

s <- Spectra(fls, source = MsBackendRawFileReader())

Do you get any error in any of these calls?

breidan commented 3 months ago

@jorainer , running

> library(MsBackendRawFileReader)
Package 'MsBackendRawFileReader' version 1.8.1
> be <- backendInitialize(MsBackendRawFileReader(), files[4])
> s <- Spectra(files[4], source = MsBackendRawFileReader())

with the packages from the current Bioconductor 3.18 executes without error. Switching to the github versions (see above) throws errors:

> be <- backendInitialize(MsBackendRawFileReader(), files[4])
Error in .local(object, ...) : 'data' has to be a 'DataFrame'
> s <- Spectra(files[4], source = MsBackendRawFileReader())
Error in .local(object, ...) : 'data' has to be a 'DataFrame'

What next?

jorainer commented 3 months ago

good question - I have issues run this locally because I struggle getting the rawfilereader running on my system. I would maybe fork the MsBackendRawFileReader, try to see how/if I can fix something and ask you to check if that solves the issue?

jorainer commented 3 months ago

Can you please install the MsBackendRawFileReader from my repo: r remotes::install_github("jorainer/MsBackendRawFileReader", ref = "devel"), restart R and then try again?

jorainer commented 3 months ago

maybe you could also try be <- backendInitialize(MsBackendRawFileReader(), files = files[4]) - or specifcally call the method from the package using be <- MsBackendRawFileReader::backendInitialize(MsBackendRawFileReader(), files[4]) (that would tell us that there is indeed a namespace problem and the generic is not imported properly)

breidan commented 3 months ago

@jorainer, thanks for the support. Will try this on Mon. Cheers Andreas

breidan commented 3 months ago

So I installed MsBackendRawFileReader from your repo and ran again with the github versions MsExperiment 1.5.4, MsCoreUtils 1.15.4, ProtGenerics 1.35.3, and Spectra 1.13.5, and now everything works without a problem. I did notice though that from your repo it's MsBackendRawFileReader 1.5.6 while the current Bioc version is 1.8.1.

Will now try to do the filtering with filterSpectra.

Comming back to chromPeaks: What I do is findChromPeaks-centWave with filtered MsExperiments objects. I then have two of them of which I extract chromPeaks seperately. I rbind the two matrices, rename the rows and feed that back into one of the XcmsExperiment objects. Then I continue with alignment and grouping. Any simpler way?

breidan commented 3 months ago

@jorainer , there is a new issue: with the updated packages I get now this error below (msexp is a MsExperiment object, just saying :-))

> msexp|>filterMz(c(415.2,416.2))|>filterRt(11.34*60+c(-100,100))|>plot(type="XIC")
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'plot': error in evaluating the argument 'object' in selecting a method for function 'filterRt': unable to find an inherited method for function ‘filterMzRange’ for signature ‘"MsExperiment"’

That did not happen with the regular packages.

jorainer commented 3 months ago

For the issue with the chromPeaks - maybe also better to spin that off as a separate issue? I find the separate chrom peak detection and then merging the chrom peaks into the same object again a bit problematic: when you extract then EICs it will then create the chromatogram from all MS1 scans, regardless of the filter string (and in which of the two subsets the chrom peak was detected). - or, if the m/z ranges of the two "filter functions" are not overlapping it might actually also work.

Potential alternatives: assign a different MS levels to spectra of the two filter functions. you could then run (on the single object with all spectra) findChromPeaks separately for the two filter functions specifying their "MS level" with parameter msLevel. Also most other methods/functions (should?) support the msLevel, thus it would be possible to extract the correct EICs, do the feature grouping properly (separately) for them etc.

jorainer commented 3 months ago

For the error that you got: indeed, there is no filterMzRange for MsExperiment. You would need to use the filterSpectra method instead:

msexp |>
filterSpectra(filterMz, c(415.2, 416.2)) |>
filterSpectra(filterRt, c(...)) |>
plot(type = "XIC")