rformassspectrometry / MetaboAnnotation

High level functionality to support and simplify metabolomics data annotation.
https://rformassspectrometry.github.io/MetaboAnnotation/
12 stars 9 forks source link

Error for matchSpectra: "Not implemented for MsBackendMassbankSql." #95

Closed YANGJJ93MS closed 1 year ago

YANGJJ93MS commented 1 year ago

Hi, I was using this matching function but get errors like this: demo1_match = matchSpectra(demo1_ms2, mbank, param = prm) BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: Error in .local(object, ...): Not implemented for MsBackendMassbankSql.

I used to exact code from the tutorials. However, the matching function only worked for once. The error kept poping up for the rest of my trails. Anyone know a solution to this?

Thanks! Scott

YANGJJ93MS commented 1 year ago

It seems that this error occurred whenever I run this command for normalization: mbank <- addProcessing(mbank, norm_int). Otherwise, the match fucnction worked well without normalization.

jorainer commented 1 year ago

Can you please provide the output of your sessionInfo()? I will try to reproduce the error locally and check what it might have caused.

YANGJJ93MS commented 1 year ago

Hi Jorainer,

Thank you for your reply after a weekend! This is my sessionInfo(): R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale: [1] LC_COLLATE=English_Singapore.936 LC_CTYPE=English_Singapore.936 LC_MONETARY=English_Singapore.936 LC_NUMERIC=C
[5] LC_TIME=English_Singapore.936

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] MsBackendMgf_1.6.0 pander_0.6.5 xcms_3.20.0 MSnbase_2.24.2 Biobase_2.58.0 BiocManager_1.30.20
[7] mzR_2.32.0 Rcpp_1.0.10 Retip_0.5.4 ggplot2_3.4.1 keras_2.11.1 lightgbm_3.3.5
[13] R6_2.5.1 MetaboAnnotation_1.2.0 RSQLite_2.3.0 MsBackendMassbank_1.6.1 Spectra_1.8.3 ProtGenerics_1.30.0
[19] BiocParallel_1.32.6 S4Vectors_0.36.2 BiocGenerics_0.44.0

loaded via a namespace (and not attached): [1] utf8_1.2.3 reticulate_1.28 tidyselect_1.2.0 htmlwidgets_1.6.2 grid_4.2.2
[6] pROC_1.18.0 devtools_2.4.5 munsell_0.5.0 codetools_0.2-19 preprocessCore_1.60.2
[11] DT_0.27 future_1.32.0 miniUI_0.1.1.1 withr_2.5.0 colorspace_2.1-0
[16] rstudioapi_0.14 tensorflow_2.11.0 robustbase_0.95-0 rJava_1.0-6 mzID_1.36.0
[21] listenv_0.9.0 MatrixGenerics_1.10.0 GenomeInfoDbData_1.2.9 bit64_4.0.5 rprojroot_2.0.3
[26] parallelly_1.35.0 vctrs_0.6.0 generics_0.1.3 MetaboCoreUtils_1.6.0 ipred_0.9-14
[31] itertools_0.1-3 timechange_0.2.0 doParallel_1.0.17 GenomeInfoDb_1.34.9 clue_0.3-64
[36] rsvg_2.4.0 MsCoreUtils_1.10.0 AnnotationFilter_1.22.0 bitops_1.0-7 cachem_1.0.7
[41] DelayedArray_0.23.2 promises_1.2.0.1 scales_1.2.1 fingerprint_3.5.7 nnet_7.3-18
[46] gtable_0.3.3 affy_1.76.0 globals_0.16.2 processx_3.8.0 timeDate_4022.108
[51] rlang_1.1.0 zeallot_0.1.0 splines_4.2.2 lazyeval_0.2.2 ModelMetrics_1.2.2.2
[56] impute_1.72.3 reshape2_1.4.4 httpuv_1.6.9 MassSpecWavelet_1.64.1 caret_6.0-94
[61] tools_4.2.2 lava_1.7.2.1 usethis_2.1.6 CompoundDb_1.2.1 rcdklibs_2.8
[66] affyio_1.68.0 ellipsis_0.3.2 RColorBrewer_1.1-3 sessioninfo_1.2.2 MultiAssayExperiment_1.24.0 [71] plyr_1.8.8 base64enc_0.1-3 zlibbioc_1.44.0 purrr_1.0.1 RCurl_1.98-1.10
[76] ps_1.7.3 prettyunits_1.1.1 rpart_4.1.19 urlchecker_1.0.1 SummarizedExperiment_1.28.0 [81] cluster_2.1.4 fs_1.6.1 magrittr_2.0.3 data.table_1.14.8 RANN_2.6.1
[86] pcaMethods_1.90.0 whisker_0.4.1 matrixStats_0.63.0 pkgload_1.3.2 mime_0.12
[91] xtable_1.8-4 XML_3.99-0.14 readxl_1.4.2 IRanges_2.32.0 gridExtra_2.3
[96] tfruns_1.5.1 compiler_4.2.2 ChemmineR_3.50.0 tibble_3.2.1 ncdf4_1.21
[101] crayon_1.5.2 htmltools_0.5.4 later_1.3.0 snow_0.4-4 lubridate_1.9.2
[106] DBI_1.1.3 dbplyr_2.3.2 MASS_7.3-58.3 MsFeatures_1.6.0 Matrix_1.5-3
[111] cli_3.6.0 vsn_3.66.0 parallel_4.2.2 gower_1.0.1 igraph_1.4.1
[116] GenomicRanges_1.50.2 pkgconfig_2.0.3 recipes_1.0.5 MALDIquant_1.22.1 xml2_1.3.3
[121] foreach_1.5.2 hardhat_1.2.0 XVector_0.38.0 prodlim_2019.11.13 stringr_1.5.0
[126] callr_3.7.3 digest_0.6.31 cellranger_1.1.0 curl_5.0.0 shiny_1.7.4
[131] rcdk_3.7.0 rjson_0.2.21 lifecycle_1.0.3 nlme_3.1-162 jsonlite_1.8.4
[136] QFeatures_1.8.0 desc_1.4.2 limma_3.54.2 fansi_1.0.4 pillar_1.9.0
[141] lattice_0.20-45 fastmap_1.1.1 DEoptimR_1.0-11 pkgbuild_1.4.0 survival_3.5-5
[146] glue_1.6.2 remotes_2.4.2 png_0.1-8 iterators_1.0.14 bit_4.0.5
[151] class_7.3-21 stringi_1.7.12 profvis_0.3.7 blob_1.2.4 memoise_2.0.1
[156] dplyr_1.1.1 future.apply_1.10.0

YANGJJ93MS commented 1 year ago

This is my demo data in mzML format. envmix-dda.zip

Could you please kindly look into my codes where this issue occurred:

`low_int <- function(x, ...) { x > max(x, na.rm = TRUE) * 0.05 }

fs <- list.files("D:/MRMdatabase/DDAdata/DDA_peak_annoation/raw_data_1/", pattern = "*.mzML", full.names = TRUE) demo1 = readMSData(fs[1], mode = "onDisk") cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10, peakwidth = c(3, 30)) demo1 <- findChromPeaks(demo1, param = cwp) demo1_ms2 <- chromPeakSpectra(demo1, return.type = "Spectra") demo1_ms2 = filterIntensity(demo1_ms2, intensity = low_int)

demo1_ms2 = demo1_ms2[lengths(demo1_ms2)>1]

norm_int <- function(x, ...) { maxint <- max(x[, "intensity"], na.rm = TRUE) x[, "intensity"] <- 100 * x[, "intensity"] / maxint x } demo1_ms2 = addProcessing(demo1_ms2,norm_int) prm <- CompareSpectraParam(ppm = 10, requirePrecursor = TRUE, THRESHFUN = function(x) which(x >= 0.7)) mbankcon <- dbConnect(SQLite(), "MassbankSql-2021-03.db") mbank <- Spectra(mbankcon, source = MsBackendMassbankSql()) mbank <- addProcessing(mbank, norm_int) demo1_match3 = MetaboAnnotation::matchSpectra(demo1_ms2, mbank, param = prm) demo1_match3 = demo1_match3[whichQuery(demo1_match3)]`

YANGJJ93MS commented 1 year ago

I have tried with the demo data from this tutorial (https://jorainer.github.io/SpectraTutorials/articles/Spectra-matching-with-MetaboAnnotation.html#references).

It did not work in my pc neither. fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML", package = "msdata") demo1 <- readMSData(fl, mode = "onDisk")

jorainer commented 1 year ago

The code (on the example data you provided) runs without problem on my local system (using the same versions of the Spectra, MetaboAnnotation and MsBackendMassbank packages).

Could you please try the following: first, load only the required packages (and ideally in this order) - you have quite some packages loaded, and maybe some of them overwrites a function from the above mentioned packages.

library(xcms)
library(Spectra)
library(MetaboAnnotation)
library(MsBackendMassbank)
library(RSQLite)

Then, please disable parallel processing (immediately after loading the packages) using

register(SerialParam())

Do you then still get the error?

YANGJJ93MS commented 1 year ago

Hi Rainer,

It worked well now. So is the issue cuased be the parallel processing? This is the sessioninfo for your information:

R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale: [1] LC_COLLATE=English_Singapore.936 LC_CTYPE=English_Singapore.936 LC_MONETARY=English_Singapore.936 LC_NUMERIC=C
[5] LC_TIME=English_Singapore.936

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] RSQLite_2.3.0 MsBackendMassbank_1.6.1 Spectra_1.8.3 MetaboAnnotation_1.2.0 xcms_3.20.0 MSnbase_2.24.2
[7] ProtGenerics_1.30.0 S4Vectors_0.36.2 mzR_2.32.0 Rcpp_1.0.10 Biobase_2.58.0 BiocGenerics_0.44.0
[13] BiocParallel_1.32.6

loaded via a namespace (and not attached): [1] bitops_1.0-7 matrixStats_0.63.0 fs_1.6.1 bit64_4.0.5 doParallel_1.0.17
[6] RColorBrewer_1.1-3 GenomeInfoDb_1.34.9 tools_4.2.2 utf8_1.2.3 R6_2.5.1
[11] DT_0.27 affyio_1.68.0 DBI_1.1.3 lazyeval_0.2.2 colorspace_2.1-0
[16] tidyselect_1.2.0 gridExtra_2.3 bit_4.0.5 compiler_4.2.2 MassSpecWavelet_1.64.1
[21] preprocessCore_1.60.2 cli_3.6.0 xml2_1.3.3 DelayedArray_0.23.2 scales_1.2.1
[26] DEoptimR_1.0-11 robustbase_0.95-0 affy_1.76.0 digest_0.6.31 XVector_0.38.0
[31] base64enc_0.1-3 pkgconfig_2.0.3 htmltools_0.5.4 MetaboCoreUtils_1.6.0 MatrixGenerics_1.10.0
[36] dbplyr_2.3.2 fastmap_1.1.1 limma_3.54.2 htmlwidgets_1.6.2 rlang_1.1.0
[41] rstudioapi_0.14 impute_1.72.3 generics_0.1.3 jsonlite_1.8.4 mzID_1.36.0
[46] dplyr_1.1.1 RCurl_1.98-1.10 magrittr_2.0.3 GenomeInfoDbData_1.2.9 MALDIquant_1.22.1
[51] Matrix_1.5-3 munsell_0.5.0 fansi_1.0.4 MsCoreUtils_1.10.0 lifecycle_1.0.3
[56] vsn_3.66.0 MASS_7.3-58.3 SummarizedExperiment_1.28.0 zlibbioc_1.44.0 plyr_1.8.8
[61] blob_1.2.4 grid_4.2.2 parallel_4.2.2 lattice_0.20-45 MsFeatures_1.6.0
[66] pillar_1.9.0 igraph_1.4.1 GenomicRanges_1.50.2 QFeatures_1.8.0 rjson_0.2.21
[71] codetools_0.2-19 ChemmineR_3.50.0 XML_3.99-0.14 glue_1.6.2 pcaMethods_1.90.0
[76] BiocManager_1.30.20 MultiAssayExperiment_1.24.0 vctrs_0.6.0 png_0.1-8 foreach_1.5.2
[81] gtable_0.3.3 CompoundDb_1.2.1 RANN_2.6.1 clue_0.3-64 cachem_1.0.7
[86] ggplot2_3.4.1 AnnotationFilter_1.22.0 rsvg_2.4.0 ncdf4_1.21 tibble_3.2.1
[91] iterators_1.0.14 memoise_2.0.1 IRanges_2.32.0 cluster_2.1.4

YANGJJ93MS commented 1 year ago

Thank you for your solution!!

jorainer commented 1 year ago

Yes, parallel processing will not work for SQL-based backends (as the connection to the database can not be shared across the different parallel processes). In the package versions of the upcoming Bioconductor release 3.17 parallel processing will by default be disabled if not supported.

Note, that there is also an easier (and more reproducible) way to access MassBank reference databases:

library(AnnotationHub)
ah <- AnnotationHub()
## query what MassBank versions are available:
query(ah, "MassBank")
AnnotationHub with 3 records
# snapshotDate(): 2022-10-31
# $dataprovider: MassBank
# $species: NA
# $rdataclass: CompDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH107048"]]' 

             title                                
  AH107048 | MassBank CompDb for release 2021.03  
  AH107049 | MassBank CompDb for release 2022.06  
  AH111334 | MassBank CompDb for release 2022.12.1
## Load one specific version
cdb <- ah[["AH111334"]]
mbank <- Spectra(cdb)

you can then use this mbank variable like the one you used before. Also note that the ID of the data resource will not change, so, after having identified the version you are interested you can simply load the data without the query call etc. Also, once you have downloaded the data once it will subsequently be loaded from the cache (hence not downloaded again).

I will also update the tutorials, but didn't find the time yet.

YANGJJ93MS commented 1 year ago

Thanks Rainer!! This is awesome for applications! Is this easy access available for other mass spectra databases, such as HMDB, MoNA, METLIN, and GNPS?

jorainer commented 1 year ago

Unfortunately not (yet) - we could start working on HMDB and MoNA (and eventually GNPS), METLIN will not be possible because it restricts usage and does not allow to redistribute.

jorainer commented 1 year ago

I'm closing the issue now - feel free to reopen if needed.

YANGJJ93MS commented 1 year ago

Thanks a lot! Looking forward to the new database distribution!