Tutorial sections and SQLite database for MONA Spectral Database (.msp)

Tony-II commented 3 years ago

Hello jorainer,

First of all many thanks for this utterly great and comprehensible tutorial and of course all the wonderful work hidden in the Spectra package.

I would like to ask some questions about it:

1.) Could function compareSpectra() always return type matrix (array)?. This would it ease up the determination of best_match in case oflength(mbank_sub) == 1 or length(sps_sub) == 1. Implicetely I would suggest to introduce a drop = F argument in the return statement of the function for clarity so as to have the individual correct row and col indicees for the respective matching partners (indicess db_sub & sps_sub) ...

2.) As there is also an equivalent for the SQL version of MassBank, namely MONA in https://github.com/computational-metabolomics/msp2db/releases/tag/v0.0.14-mona-23042021 would it be possible to have that included in the turial? Alternatively a tutorial section on how to compare against .msp databases (MsBackendMsp)) would be great.

3.) Any hint on parallelisation would be helpful, especially in case for parallel db-connections when several hundreds of features should be compared.

many thanks kind regards Tony

jorainer commented 3 years ago

Thanks for the feedback!

2) Sure, that's a very good suggestion. I myself never used the MsBackendMsp backend but I will look into that. Regarding the database, this is a non-official database dump, right? In what format is MoNa providing the data in general? Back then when I checked it it was their own yaml format... not ideal...

3) For the parallel (and maybe more convenient) matching, have a look also at the new tutorial I added: https://jorainer.github.io/SpectraTutorials/articles/Spectra-matching-with-MetaboAnnotation.html , pre-filtering by matching precursor m/z significantly increases the performance. Alternatively, there is the possibility to use BPPARAM = MulticoreParam(4) or equivalent to run comparisins in parallel. Only, since there are no parallel connections to the database, the parallel processing is limited to the data after it was retrieved from the database.

jorainer commented 3 years ago

Ah, and regarding 1): I've opened an issue in Spectra.

jorainer commented 3 years ago

Again, regarding 1): compareSpectra has a parameter SIMPLIFY. If you set that to SIMPLIFY = FALSE it will always return a matrix. We'll look into changing the default SIMPLIFY = TRUE into SIMPLIFY = FALSE, but for now it should work if you do that manually.

Tony-II commented 3 years ago

many thanks so far! Regarding: 1.) this works really well ... 2.) if the format under https://github.com/computational-metabolomics/msp2db/releases/tag/v0.0.14-mona-23042021 is not the ideal one, there would be an alternative for other formats under:

https://mona.fiehnlab.ucdavis.edu/downloads The availabe formats offered here are: .json .msp (NIST compatible) .sdf (NIST compatible)

jorainer / SpectraTutorials

Tutorial sections and SQLite database for MONA Spectral Database (.msp) #27