wilhelm-lab / oktoberfest

Rescoring and spectral library generation pipeline for proteomics.
MIT License
32 stars 8 forks source link

Prosit Website Typo #157

Open CCranney opened 9 months ago

CCranney commented 9 months ago

Describe the bug

This isn't a bug issue with the code, I just thought I'd point out there's a typo on the Prosit website. When identifying a file from which test spectra are generated, it says that FASTA format is "comming soon." Coming only has one 'm.' The previous prosit repository is read-only, and it looks like prosit developers moved to here instead, so posting the issue here. Let me know if I should go somewhere else.

I am also curious when that FASTA compatibility will be available, as it's been "coming soon" for at least half a year. If you have any information on that, I would greatly appreciate it.

Thank you for your projects!

picciama commented 9 months ago

@CCranney thanks for reaching out. The webinterface does not have all functionality due to technical limitations with file uploads but the standalone version of oktoberfest already supports the fasta digest.

@MatthewThe can you take care of the website? Also @CCranney has a point with the fasta file. It is supported in the oktoberfest standalone version but maybe we want to offer it through the webinterface for the people who may not wish to install oktoberfest? If so, how do we handle the file upload? Is this supported in the backend already, i.e. do we just need to merge the current oktoberfest package?

MatthewThe commented 9 months ago

@WassimG is taking care of the Prosit website if I remember correctly.

Uploading fasta files is supported by the backend, as are setting the digestion parameters. However, since the config.json file was updated in oktoberfest, we first need to fix some compatibility issues. I think the plan was to tackle that in the hackathon in January.

CCranney commented 9 months ago

Hi @picciama and @MatthewThe,

Thank you both for your comments. I'm in the process of running a spectral library generation job through oktoberfest on my M1 Mac computer. I initially used a fasta file I had on hand (human proteins, ~13.6 MB) and a config file copied from oktoberfest documentation. It's been running for ~11 hours now, and assume it will be finished by the time I wake up tomorrow, but was wondering if it is normal for a spectral library to take this long to generate. The myPrositLib.csv file is currently ~105 GB. In speaking with others in my lab, they recommended I change missedCleavages from 2 -> 1 and maxLength from 60 -> 30, and will run that after this run for comparison. However, the size of the file and length of runtime does give me pause, and I thought I should ask if that is normal.

picciama commented 9 months ago

I have experienced similar things yes, depending on the file format used, it can be a lot to write. It is not yet optimal and there are a few open issues around adding additional config options (https://github.com/wilhelm-lab/oktoberfest/issues/126) as well as output formats that are smaller and faster to write. However, bare in mind that spectral libraries can be quite huge depending on the settings and the format used. We plan to tackle these issues in January.

tobiasko commented 9 months ago

Hi @CCranney,

not sure what the specLib should be used for, but I would try limiting the prediction space to:

On the human reference proteome (~80'000 entries) this will already take quite a while to compute (~1.7 million predictions) and consume around 5 Gig of disc space in .msp format.

Best, Tobi

tobiasko commented 9 months ago

For me that job already takes 1.5 h of wall clock time to finish:

2023-12-21 14:33:56,149 - INFO - oktoberfest.runner::run_job Oktoberfest version 0.5.2
Copyright 2023, Wilhelmlab at Technical University of Munich
2023-12-21 14:33:56,149 - INFO - oktoberfest.runner::run_job Job executed with the following config:
2023-12-21 14:33:56,149 - INFO - oktoberfest.runner::run_job {
    "type": "SpectralLibraryGeneration",
    "tag": "",
    "allFeatures": false,
    "inputs": {
        "library_input": "/scratch/cpanse/PXD028735/fasta/uniprotkb_proteome_UP000005640_2023_07_04.fasta",
        "library_input_type": "fasta",
...
2023-12-21 16:01:24,810 - INFO - oktoberfest.runner::generate_spectral_lib Indices 1659000, 1666000
2023-12-21 16:01:44,942 - INFO - oktoberfest.runner::generate_spectral_lib Indices 1666000, 1673000
2023-12-21 16:02:05,263 - INFO - oktoberfest.runner::generate_spectral_lib Indices 1673000, 1680000
2023-12-21 16:02:25,933 - INFO - oktoberfest.runner::generate_spectral_lib Indices 1680000, 1687000
2023-12-21 16:02:52,729 - INFO - oktoberfest.runner::generate_spectral_lib Indices 1687000, 1694000
2023-12-21 16:03:24,192 - INFO - oktoberfest.runner::generate_spectral_lib Indices 1694000, 1701000
2023-12-21 16:03:52,146 - INFO - oktoberfest.runner::generate_spectral_lib Indices 1701000, 1708000
2023-12-21 16:04:23,338 - INFO - oktoberfest.runner::generate_spectral_lib Indices 1708000, 1715000
2023-12-21 16:05:01,653 - INFO - oktoberfest.runner::generate_spectral_lib Last Batch from index 1715000
2023-12-21 16:05:01,654 - INFO - oktoberfest.runner::generate_spectral_lib Batch of size 1711
picciama commented 7 months ago

The typo will be fixed in a whole new UI that will replace the website and provide a browser UI as part of the package itself, for progress see the attached milestone. The issues wrt. to spectral library generation are resolved on the development branch. This is not yet released as it is part of a bigger release for the coming version 0.6.0, but you can try it out using the development branch already. It is now much faster and more stable and I have added more flexibility through enhanced options in the configuration file. I will close this issue as soon as the release is ready, together with the UI hopefully.

picciama commented 7 months ago

@CCranney The new oktoberfest version 0.6.0 is published now. I leave this issue open till the new UI is ready though.