wfondrie / depthcharge

A deep learning toolkit for mass spectrometry
https://wfondrie.github.io/depthcharge/
Apache License 2.0
59 stars 18 forks source link

Example code to read .mgf file #44

Closed GuptaVishu2002 closed 4 months ago

GuptaVishu2002 commented 8 months ago

Hi, I would like to know how to read and preprocess a .mgf file using the package. Can you please help me by providing an example code for that, which can then be used to pass on other package functions such as Encoder and Transformer? Thank You

bittremieux commented 8 months ago

You can read and parse spectra from MGF files using the Dataset functionality. It's as straightforward as this:

from depthcharge.data. import pectrumDataset

dataset = SpectrumDataset("my_file.mgf", "my_file.lance")

This can then be used as any general PyTorch dataset to provide to your model for training, validation, or testing. How you do this specifically depends on how you use PyTorch, Lightning, etc.

Note that the API is currently in heavy development, so there are some breaking changes between various DepthCharge versions. The Lance integration is included in the development version if you install from GitHub, but not in the latest release on PyPI yet.

wfondrie commented 8 months ago

Hi @GuptaVishu2002 - I'm planing the next release for after #43 is reviewed and merged and I'm working on documentation this week. Stay tuned!

GuptaVishu2002 commented 8 months ago

Hi @bittremieux , @wfondrie - thank you very much for the reply. Looking forward to the updates.

GuptaVishu2002 commented 7 months ago

Hi @bittremieux @wfondrie, I hope you are doing well. Would it be possible for you to give a sample code on the recommended way to incorporate arbitrary information (such as precursor_mz, precursor_charge) into the spectrum representation for the transformer (via subclassing of SpectrumTransformerEncoder class and overwriting the precursor_hook() method)? Thank You.

wfondrie commented 7 months ago

Hi @GuptaVishu2002 - sorry for the delay! We're still trying to merge a major PR, then I'll get cracking on refreshed and more detailed documentation. Thanks.

For now, the best place to learn how to use the precursor is to look at the unit tests: https://github.com/wfondrie/depthcharge/blob/bd2861ffe61092f3d30d96d01d2ee53309812c0a/tests/unit_tests/test_transformers/test_spectrum_transformers.py#L46-L68

wfondrie commented 4 months ago

I haven't specifically added how to read an MGF file, but I just added documentation in #47 about working with mass spec data in general. Have a look and let me know if you have other questions!