ipb-halle / MetFragRelaunched

Relaunch of the initial MetFrag project.
http://ipb-halle.github.io/MetFrag/
17 stars 17 forks source link

Update embedded spectral library #31

Open sneumann opened 5 years ago

sneumann commented 5 years ago

The class de.ipbhalle.metfraglib.scoreinitialisation.OfflineMetFusionSpectralSimilarityScoreInitialiser at https://github.com/ipb-halle/MetFragRelaunched/blob/c57f9d2b406350b2357ce9f7ce42a286cefcca13/MetFragLib/src/main/java/de/ipbhalle/metfraglib/scoreinitialisation/OfflineMetFusionSpectralSimilarityScoreInitialiser.java#L26

is used to initialize parameters for the MetFusion-like score which includes the reading of the spectral file MoNA-export-LC-MS.mb. If nothing else given in the settings with OfflineSpectralDatabaseFile = ...

This class uses the file located at https://github.com/ipb-halle/MetFragRelaunched/blob/master/MetFragLib/src/main/resources/MoNA-export-LC-MS.mb

As already declared this file is in non-standard format to only include a little information needed by the score. The class located at de.ipbhalle.metfraglib.peaklistreader.MultipleTandemMassPeakListReade in https://github.com/ipb-halle/MetFragRelaunched/blob/c57f9d2b406350b2357ce9f7ce42a286cefcca13/MetFragLib/src/main/java/de/ipbhalle/metfraglib/peaklistreader/MultipleTandemMassPeakListReader.java is used to read this file. This creates a de.ipbhalle.metfraglib.collection.SpectralPeakListCollection which is stored in the global MetFrag settings object later used by the score class de.ipbhalle.metfraglib.score.OfflineMetFusionSpectralSimilarityScore uses this data to calculate the MetFusion-like score for each candidate.

There might be two possibilities now. First, you simply create a new spectral file in the format I used. It's quite simple as it only needs the parameters:

SampleName,InChI,InChIKey,IsPositiveIonMode,PrecursorIonMode,MassError,MSLevel,IonizedPrecursorMass,NumPeaks,MolecularFingerPrint

followed by the spectral data. You can easily figure that out when looking in the default file. This file can then be used by defining its path with OfflineSpectralDatabaseFile = ...

The used fingerprint function is the MACCSFingerprint included in the CDK implementation.

The second possibility is to define it's own spectral file reader instead of the reader de.ipbhalle.metfraglib.peaklistreader.MultipleTandemMassPeakListReader currently used. Here, you could implement a NIST or a MassBank file reader which also needs to create a de.ipbhalle.metfraglib.collection.SpectralPeakListCollection object. But you need to include the fingerprint of the underlying molecule for each spectrum.

Thanks @c-ruttkies for the information! Yours, Steffen

schymane commented 5 years ago

Current thoughts after talking with @adelenelai is that option (1) is likely easiest, I can count at least 6 (and probably more) formats we'll need to work with, so that a new reader for each (option 2) seems impractical - and ideally we'd like to merge where possible and not have 1000s of libraries. This way we could do small converters for every format => MetFrag mb format and keep the efficient internal format that MetFrag needs, then offer users the opportunity to download resulting library files, which they can then specify using OfflineSpectralDatabaseFile=... @MaliRemorker @he-ob @rickhelmus