systemsomicslab / MsdialWorkbench

Universal workbench incorporating msdial, msfinder, and mrmprobs
https://systemsomicslab.github.io/compms/msdial/main.html
50 stars 13 forks source link

Workflow MSDIAL to SIRIUS #112

Open GerdBalcke opened 1 year ago

GerdBalcke commented 1 year ago

I am trying to combine MS-DIAL preprocessing and SIRIUS/CANOPUS annotation in one workflow.

For this I take exported .mat spectra from MS-Dial vaersion 5.1.230517 to SIRIUS and want to perform batch processing.

Problem 1) This bugs since Sirius expects no “null” in case no library annotation was made before.

I sent this to Markus and he wanted to change something on the SIRIUS side and asked me to delete the “null” entries as a workaround. This works but should be reararranged either in MS-DIAL or in SIRIUS.

Problem 2) I next provide one entry of the .mat output of MS-Dial: ID=613 is the alignment ID of MSDIAL but it is not separated from the mz mass (640.3454) in the NAME section and the latter it is not indicated as "precursor mz" .

NAME: Unknown|ID=613640.3454|RT=13.113 PRECURSORMZ: 640.34544 PRECURSORTYPE: [M-H]- RETENTIONTIME: 13.1132666666667 FORMULA: null ONTOLOGY: null INCHIKEY: null SMILES: null COMMENT: |PEAKID=613|ISOTOPE=M+0 IONMODE: Negative MSTYPE: MS1 Num Peaks: 11 640.34544 2605 640.40443 20 641.06957 21 641.34909 893 641.39291 18 641.4231 30 641.69005 17 642.26588 24 642.29664 19 642.34746 184 642.38204 29 MSTYPE: MS2 Num Peaks: 22 122.00124 6 152.99574 17 196.0379 16 214.04885 5 238.04797 4 268.0583 15 281.2483 144 321.15853 2 340.33212 3 413.77283 2 415.28565 3 416.46199 2 421.19309 2 427.44831 3 446.4576 2 455.34643 2 478.29346 56 497.25931 3 520.3027 8 550.31365 4 622.32911 3 640.34624 208

For the corresopnding .tsv output file of SIRIUS after SIRIUS/CANOPUS processing this has the following consequence 614_Mat_2023_06_05_13_47_21_AlignmentResult_2023_06_UnknownID613662.3301RT12.297 for the feature ID.

Note that 614 is the SIRIUSID and ID613 represents the AlignmentID of MSDIAL....

In case MSDial had annotated something, this AlignmentID is replaced be the metabolite name here. In this case the AlignmentID of MSDial is no longer present in the NAME section, that is the MSDIAL ID would be lost as information is taken later from the NAME section. (i.e. the PEAKID is found only under comments throughout all library entiries of the .mat export...)

Using workflows from mzMine to SIRIUS the final FeatureID in SIRIUS is actually a concatenation of the SIRIUSID+FILENAME +PEAKID of the data source.

I think this should be better aligned between the individual pipelines starting with the output .mat format of MSDIAL.

mfleisch commented 1 year ago

Hey @GerdBalcke, Hey @kozo2, the latest release of SIRIUS (https://github.com/boecker-lab/sirius/releases/tag/v5.7.3) does now handle the null values correctly.

mfleisch commented 1 year ago

I will extend our .mat parser so that it reads the MS-Dial PEAKID from the COMMENT field. This should then produce the output @GerdBalcke needs.

@kozo2 : Is there a reason why the PEAKID is part of a comment and not its own field? Do you think it might be possible to add a dedicated PEAKID field to the .mat output?

Regarding the NAME field: We do not expect any specific format of the name value. So changing it should not break anything on our end.

kozo2 commented 1 year ago

@mfleisch Thanks for your help 🙏

Is there a reason why the PEAKID is part of a comment and not its own field?

Sorry, I do not know the reason for that. I will ask @htsugawa who designed the .mat.

Do you think it might be possible to add a dedicated PEAKID field to the .mat output?

I don't think that would be hard, but I would like to hear from @htsugawa who is deciding on the design for that as well.

Regarding the NAME field: We do not expect any specific format of the name value. So changing it should not break anything on our end.

Thanks for the information 🙏

GerdBalcke commented 1 year ago

Dear Kozo,

Is there already a decision on the issue below?

Best + Thank you,

Gerd

Dr. rer. nat. habil. Gerd Balcke Head of LC-MS Metabolomics Tel: E-Mail: ORCID: +49 345 5582 1510 @. @.> https://orcid.org/0000-0002-0475-0672 [https://mailtasticcdn.azureedge.net/img/images/siglinks/company/71639e11-6a7d-4a5d-bcf0-0b3dad2b4943/u_logo-TcV2hig3uu.png] Leibniz-Institut für Pflanzenbiochemie Weinberg 3 | 06120 Halle | Deutschland Tel. +49 345 5582 0 | www.ipb-halle.dehttp://www.ipb-halle.de/ https://app.mailtastic.de/api/linkserve/campaign/d8d4d3b2-434b-4d90-a272-57ad4c21efb7/14997

Von: Kozo Nishida @.> Gesendet: Donnerstag, 15. Juni 2023 02:53 An: systemsomicslab/MsdialWorkbench @.> Cc: Balcke, Gerd @.>; Mention @.> Betreff: Re: [systemsomicslab/MsdialWorkbench] Workflow MSDIAL to SIRIUS (Issue #112)

@mfleischhttps://github.com/mfleisch Thanks for your help 🙏

Is there a reason why the PEAKID is part of a comment and not its own field?

Sorry, I do not know the reason for that. I will ask @htsugawahttps://github.com/htsugawa who designed the .mat.

Do you think it might be possible to add a dedicated PEAKID field to the .mat output?

I don't think that would be hard, but I would like to hear from @htsugawahttps://github.com/htsugawa who is deciding on the design for that as well.

Regarding the NAME field: We do not expect any specific format of the name value. So changing it should not break anything on our end.

Thanks for the information. 🙏

— Reply to this email directly, view it on GitHubhttps://github.com/systemsomicslab/MsdialWorkbench/issues/112#issuecomment-1592172831, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BAR4EQMBPUVFABX7PCJ7WW3XLJME7ANCNFSM6AAAAAAZF4JVUQ. You are receiving this because you were mentioned.Message ID: @.***>

kozo2 commented 1 year ago

@GerdBalcke @mfleisch Sorry for the late reply.

Is there already a decision on the issue below?

We decided to create a dedicated PEAKID field to the .mat output. But I'm not likely to get around to it until at least the 20th of this month.

If you guys are in a hurry to do it I would like to delegate this task to @YukiMatsuzawa . (In that case let me know it.)

GerdBalcke commented 1 year ago

Dear Kozo, no hurries, take your time.

Best, Gerd

Dr. rer. nat. habil. Gerd Balcke Head of LC-MS Metabolomics Tel: E-Mail: ORCID: +49 345 5582 1510 @. @.> https://orcid.org/0000-0002-0475-0672 [https://mailtasticcdn.azureedge.net/img/images/siglinks/company/71639e11-6a7d-4a5d-bcf0-0b3dad2b4943/u_logo-TcV2hig3uu.png] Leibniz-Institut für Pflanzenbiochemie Weinberg 3 | 06120 Halle | Deutschland Tel. +49 345 5582 0 | www.ipb-halle.dehttps://www.ipb-halle.de/ [Aktuell können Sie einige Informationen nicht sehen.Bitte aktivieren Sie externe Inhalte, um die Mail vollständig angezeigt zu bekommen oder klicken Sie hier.] https://app.mailtastic.de/api/linkserve/campaign/d8d4d3b2-434b-4d90-a272-57ad4c21efb7/14997


From: Kozo Nishida @.> Sent: Thursday, June 29, 2023 5:48 AM To: systemsomicslab/MsdialWorkbench @.> Cc: Balcke, Gerd @.>; Mention @.> Subject: Re: [systemsomicslab/MsdialWorkbench] Workflow MSDIAL to SIRIUS (Issue #112)

@GerdBalckehttps://github.com/GerdBalcke @mfleischhttps://github.com/mfleisch Sorry for the late reply.

Is there already a decision on the issue below?

We decided a dedicated PEAKID field to the .mat output. But I am not likely to get around to it until at least the 20th of this month.

If you guys are in a hurry to do it I would like to delegate this task to @YukiMatsuzawahttps://github.com/YukiMatsuzawa . (In that case let me know it.)

— Reply to this email directly, view it on GitHubhttps://github.com/systemsomicslab/MsdialWorkbench/issues/112#issuecomment-1612391695, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BAR4EQK26VR23WFQDNEESLDXNT3HJANCNFSM6AAAAAAZF4JVUQ. You are receiving this because you were mentioned.Message ID: @.***>