rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
38 stars 25 forks source link

compareSpectra inconsistency? #252

Closed Adafede closed 2 years ago

Adafede commented 2 years ago

Hi,

I just tried to play around the different possibilities offered by compareSpectra.

I was surprised that running

path_mgf_1 <- "https://raw.githubusercontent.com/mandelbrot-project/spectral_lib_matcher/main/tests/data/database.mgf"
path_mgf_2 <- "https://raw.githubusercontent.com/mandelbrot-project/spectral_lib_matcher/main/tests/data/database.mgf"

mgf_1 <- MsBackendMgf::readMgf(f = path_mgf_1)
mgf_2 <- MsBackendMgf::readMgf(f = path_mgf_2)

spectra_1 <- Spectra::Spectra(object = mgf_1)
spectra_2 <- Spectra::Spectra(object = mgf_2)
Spectra::compareSpectra(
  x = spectra_1,
  y = spectra_2,
  ppm = 10,
  tolerance = 0.01,
  MAPFUN = Spectra::joinPeaks,
  FUN = MsCoreUtils::gnps
)
         1        2
1 1.000000 0.172032
2 0.172032 1.000000

and

Spectra::compareSpectra(
  x = spectra_1,
  y = spectra_2,
  ppm = 10,
  tolerance = 0.01,
  MAPFUN = Spectra::joinPeaksGnps,
)
      1          2
1 1.00000000 0.07384886
2 0.07384886 1.00000000

Does not lead to the same results.

Is this expected? :confused:

jorainer commented 2 years ago

hm, indeed interesting. I'll have a look into it

jorainer commented 2 years ago

Actually no, you can not expect to get the same results. Spectra::compareSpectra is a two-step approach: first peaks between the two spectra are matched against each other (using MAPFUN) and then the similarity score is calculated by FUN for these matching peaks.

Spectra::compareSpectra(
  x = spectra_1,
  y = spectra_2,
  ppm = 10,
  tolerance = 0.01,
  MAPFUN = Spectra::joinPeaks,
  FUN = MsCoreUtils::gnps
)

Uses joinPeaks for the matching and then the MsCoreUtils::gnps for the similarity calculation.

Spectra::compareSpectra(
  x = spectra_1,
  y = spectra_2,
  ppm = 10,
  tolerance = 0.01,
  MAPFUN = Spectra::joinPeaksGnps,
)

This uses the joinPeaksGnps for the matching, but will then use the default ndotproduct for the similarity calculation. Thus, your two examples do quite different things. You are using two different similarity calculation methods, and also the peak mapping strategy is different: joinPeaksGnps considers (and reports) peaks matching if their m/z difference is smaller than ppm and tolerance (same as joinPeaks) but in addition reports also peaks matching if their m/z, after subtracting the precursor m/z, is smaller than defined with ppm and tolerance. You can thus get additional matching peaks with joinPeaksGnps, and also a single peak can match multiple peaks in the other spectrum. Now, only the MsCoreUtils::gnps function can handle such multi-mappings correctly.

I would suggest to use the joinPeaksGnps function only in combination with the gnps similarity function (and not e.g. the ndotproduct). This is described in the documentation of the ?joinPeaksGnps function, but not in the documentation of the compareSpectra function. I will update this in the documentation.

Hope this explained it - let me know if not.

Adafede commented 2 years ago

Oooook! Everything clear now, thank you! Yeah...saw this as leftover somewhere, good if it is updated now 😊 Using metaboAnnotation functions now any way 😛

jorainer commented 2 years ago

yes - Spectra and MsCoreUtils should provide you the tools to tinker your own approaches and methods, but MetaboAnnotation is more for the day-to-day use.