Open gmhhope opened 3 years ago
Are you using https://github.com/EuracBiomedicalResearch/CompoundDb/blob/master/R/spectrum-import-functions.R
to parse the HMDB database? I did see you have mentioned there that the precursorMz is NA in HMDB spectra XML?
@note#'#' The HMDB xml files are supposed to be extracted from the downloaded zip file#' into a folder and should not be renamed. The function identifies xml files#' containing MS/MS spectra by their file name.#'#' The same spectrum ID can be associated with multiple compounds. Thus, the#' function assignes an arbitrary ID (column `"spectrum_id"`) to values from#' each file. The original ID of the spectrum in HMDB is provided in column#' `"original_spectrum_id"`.#'#' @param x `character(1)`: with the path to directory containing the xml files.#'#' @param collapsed `logical(1)` whether the returned `data.frame` should be#' *collapsed* or *expanded*. See description for more details.#'#' @return `data.frame` with as many rows as there are peaks and columns:#'#' - spectrum_id (`integer`): an arbitrary, unique ID identifying values#' from one xml file.#' - original_spectrum_id (`character`): the HMDB-internal ID of the spectrum.#' - compound_id (`character`): the HMDB compound ID the spectrum is associated#' with.#' - polarity (`integer`): 0 for negative, 1 for positive, `NA` for not set.#' - collision_energy (`numeric`): collision energy voltage.#' - predicted (`logical`): whether the spectrum is predicted or experimentally#' verified.#' - splash (`character`): the SPLASH (SPectraL hASH) key of the spectrum#' (Wohlgemuth 2016).#' - instrument_type (`character`): the type of MS instrument on which the#' spectrum was measured.#' - instrument (`character`): the MS instrument (not available for all spectra#' in HMDB).#' - precursor_mz (`numeric`): not provided by HMDB and thus `NA`.#' - mz (`numeric` or `list` of `numeric`): m/z values of the spectrum.#' - intensity (`numeric` or `list` of `numeric`): intensity of the spectrum.
In that tutorial I was using the MsBackendHmdb backend to import data from HMDB (with this parsing function). In fact I was unable to extract the precursor m/z from the HMDB xml files.
Maybe I've just overseen it, but I could not find a field in the hmdb xml file that contains the precursor m/z information - in case you know how to get this information please let me know.
Thanks very much for all the answers!
I will need to get back to each issue again sometimes. They are indeed very useful!
Regarding the missing fields of precursorMz
, I have checked the hmdb spectra myself and I agreed with you that it is very likely that HMDB doesn't provide precursorMz
in their XML files. I have sent an email to HMDB group and ask if they can provide any information about it.
I will let you know as well if I get feedback!
Thanks, Minghao Gong
Excellent! Thanks!
Hi Johannes,
I just realized that precursorMz is another important check on the matched spectra between samples and database. Only if both precursorMz matched should the spectra comparison meaningful. I did find precursorMz in massbank spectra object. However, when I try hmdb, I don't have them in your file.
Do I have other ways to have the information or if you would like to include them in the future version?
Thanks, Minghao Gong
hmdb$precursorMz
hmdb only contains id but not other information (e.g., name)?