jorainer / SpectraTutorials

These workshops and tutorials provide use cases and examples for mass spectrometry data handling and analysis using the Spectra Bioconductor package.
https://jorainer.github.io/SpectraTutorials
18 stars 2 forks source link

precursorMz of hmdb Spectra object are NAs #11

Open gmhhope opened 3 years ago

gmhhope commented 3 years ago

Hi Johannes,

I just realized that precursorMz is another important check on the matched spectra between samples and database. Only if both precursorMz matched should the spectra comparison meaningful. I did find precursorMz in massbank spectra object. However, when I try hmdb, I don't have them in your file.

Do I have other ways to have the information or if you would like to include them in the future version?

Thanks, Minghao Gong

hmdb$precursorMz

hmdb$precursorMz
   [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
  [25] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
  [49] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
  [73] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

hmdb only contains id but not other information (e.g., name)?

BEGIN IONS
TITLE=msLevel 2; retentionTime ; scanNum 
msLevel=2
centroided=TRUE
polarity=1
spectrum_id=447554
compound_id=HMDB0060475
predicted=FALSE
splash=splash10-001i-9300000000-f6470a0cfd89aa7eb825
instrument_type=LC-ESI-ITFT
59.41436 1.839163
74.005951 3.238599
84.042999 4.241357
84.044373 100
84.08078 18.949596
85.084038 7.455676
102.054817 15.184331
120.052422 2.581904
130.049911 12.549987
130.086075 5.222175
131.089203 2.796684
148.042465 5.426773
148.060211 2.279714
189.786194 1.770397
291.560547 1.73985
299.229797 4.360426
299.239014 4.217695
END IONS
gmhhope commented 3 years ago

Are you using https://github.com/EuracBiomedicalResearch/CompoundDb/blob/master/R/spectrum-import-functions.R to parse the HMDB database? I did see you have mentioned there that the precursorMz is NA in HMDB spectra XML?

@note#'#' The HMDB xml files are supposed to be extracted from the downloaded zip file#' into a folder and should not be renamed. The function identifies xml files#' containing MS/MS spectra by their file name.#'#' The same spectrum ID can be associated with multiple compounds. Thus, the#' function assignes an arbitrary ID (column `"spectrum_id"`) to values from#' each file. The original ID of the spectrum in HMDB is provided in column#' `"original_spectrum_id"`.#'#' @param x `character(1)`: with the path to directory containing the xml files.#'#' @param collapsed `logical(1)` whether the returned `data.frame` should be#'     *collapsed* or *expanded*. See description for more details.#'#' @return `data.frame` with as many rows as there are peaks and columns:#'#' - spectrum_id (`integer`): an arbitrary, unique ID identifying values#'   from one xml file.#' - original_spectrum_id (`character`): the HMDB-internal ID of the spectrum.#' - compound_id (`character`): the HMDB compound ID the spectrum is associated#'   with.#' - polarity (`integer`): 0 for negative, 1 for positive, `NA` for not set.#' - collision_energy (`numeric`): collision energy voltage.#' - predicted (`logical`): whether the spectrum is predicted or experimentally#'   verified.#' - splash (`character`): the SPLASH (SPectraL hASH) key of the spectrum#'   (Wohlgemuth 2016).#' - instrument_type (`character`): the type of MS instrument on which the#'   spectrum was measured.#' - instrument (`character`): the MS instrument (not available for all spectra#'   in HMDB).#' - precursor_mz (`numeric`): not provided by HMDB and thus `NA`.#' - mz (`numeric` or `list` of `numeric`): m/z values of the spectrum.#' - intensity (`numeric` or `list` of `numeric`): intensity of the spectrum.
jorainer commented 3 years ago

In that tutorial I was using the MsBackendHmdb backend to import data from HMDB (with this parsing function). In fact I was unable to extract the precursor m/z from the HMDB xml files.

jorainer commented 3 years ago

Maybe I've just overseen it, but I could not find a field in the hmdb xml file that contains the precursor m/z information - in case you know how to get this information please let me know.

gmhhope commented 3 years ago

Thanks very much for all the answers!

I will need to get back to each issue again sometimes. They are indeed very useful!

Regarding the missing fields of precursorMz, I have checked the hmdb spectra myself and I agreed with you that it is very likely that HMDB doesn't provide precursorMz in their XML files. I have sent an email to HMDB group and ask if they can provide any information about it.

I will let you know as well if I get feedback!

Thanks, Minghao Gong

jorainer commented 3 years ago

Excellent! Thanks!