sirius-ms / sirius

SIRIUS is a software for discovering a landscape of de-novo identification of metabolites using tandem mass spectrometry. This repository contains the code of the SIRIUS Software (GUI and CLI)
GNU Affero General Public License v3.0
83 stars 20 forks source link

Job stuck Sirius 6.05 #200

Open Do-rossi opened 2 weeks ago

Do-rossi commented 2 weeks ago

Hi,

I have an issue using Sirius 6.05 on MacOS with negative ms/ms data. The main job stopped around 70% when i run my data using Sirius, Canopus and Database search. Some of the jobs are stuck at 70% or 20%. I tried several times to cancel all and then compute using only Sirius but it doesn’t look better. I tried to include/exclude High mass compound without any change and with or without Zodiac but it's the same results. My log also says “Invalidate existing Results and Recompute” and then start a computation that never ends.

Capture d’écran 2024-09-11 à 16 25 41

"sept. 09, 2024 10:02:07 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <740>[BackgroundRunJob-740] Invalidate existing Results and Recompute! sept. 09, 2024 10:02:07 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <740>[BackgroundRunJob-740] Start computation..."

I also figured out that even if I choose adducts like [M+HCOOH−H]−, only [M−H]− are found in my data. (I know that there is [M+HCOOH−H]− adducts)

I'm maybe doing something wrong, If you have any advice let me know.

MartinHoffmannJena commented 2 weeks ago

Hi,

Are you able to share the data that produces this state (together with the set of parameters used to start the computation)?

Do-rossi commented 1 week ago

Hi,

This is the mgf file obtained after Mzmine. Sirius.mgf.zip I used the GUI and didn't change the default parameters, except that I also choose HCOOH adducts in fallbacks adducts and a custom database search.

config spectra-search --FormulaSearchSettings.applyFormulaConstraintsToBottomUp=false --IsotopeSettings.filter=true --UseHeuristic.useOnlyHeuristicAboveMz=650 --FormulaSearchDB=, --Timeout.secondsPerTree=0 --FormulaSettings.enforced=H,C,N,O,P --Timeout.secondsPerInstance=0 --AlgorithmProfile=qtof --SpectralMatchingMassDeviation.allowedPeakDeviation=10.0ppm --AdductSettings.enforced=, --AdductSettings.prioritizeInputFileAdducts=true --UseHeuristic.useHeuristicAboveMz=300 --IsotopeMs2Settings=IGNORE --MS2MassDeviation.allowedMassDeviation=10.0ppm --SpectralMatchingMassDeviation.allowedPrecursorDeviation=10.0ppm --FormulaSearchSettings.performDeNovoBelowMz=400.0 --FormulaSearchSettings.applyFormulaConstraintsToDatabaseCandidates=false --EnforceElGordoFormula=true --NumberOfCandidatesPerIonization=1 --FormulaSettings.detectable=B,S,Cl,Se,Br --NumberOfCandidates=10 --AdductSettings.fallback=[[M-H]-,[M+CH2O2-H]-] --FormulaSearchSettings.performBottomUpAboveMz=0 --FormulaResultThreshold=true --ExpansiveSearchConfidenceMode.confidenceScoreSimilarityMode=APPROXIMATE --StructureSearchDB=ester_lib,METACYC,BloodExposome,CHEBI,COCONUT,FooDB,GNPS,HMDB,HSDB,KEGG,KNAPSACK,LOTUS,LIPIDMAPS,MACONDA,MESH,MiMeDB,NORMAN,PLANTCYC,PUBCHEMANNOTATIONBIO,PUBCHEMANNOTATIONDRUG,PUBCHEMANNOTATIONFOOD,PUBCHEMANNOTATIONSAFETYANDTOXIC,SUPERNATURAL,TeroMol,YMDB --RecomputeResults=false formulas fingerprints classes structures

Thank you for your help

MartinHoffmannJena commented 1 week ago

Hi,

I'm not able to reproduce this, I don't get any stuck jobs and I also do see [M+HCOOH−H] results:

image

Can you run it again and check the CPU/Memory load on your computer while you run it?

Do-rossi commented 1 week ago

Hi,

I recomputed all tasks and now it's stuck at 73%. My CPU and memory was like this when I started the computation: Capture d’écran 2024-09-18 à 09 43 24 Capture d’écran 2024-09-18 à 09 43 51

And now that it's stuck, it's like that: Capture d’écran 2024-09-18 à 12 29 21 Capture d’écran 2024-09-18 à 12 29 37

Also a part of my log :

INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] DONE! sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381946>[CanopusSubToolJob-381946 | 3964 (618855364793789458)] DONE! sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381947>[FingerblastSubToolJob-381947 | 3964 (618855364793789458)] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Start computation... sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381947>[FingerblastSubToolJob-381947 | 3964 (618855364793789458)] Start computation... sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] DONE! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381970>[CanopusSubToolJob-381970 | 3954 (618855364516965359)] DONE! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] DONE! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <382190>[CanopusSubToolJob-382190 | 3929 (618855364076563369)] DONE! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381971>[FingerblastSubToolJob-381971 | 3954 (618855364516965359)] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Start computation... sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381971>[FingerblastSubToolJob-381971 | 3954 (618855364516965359)] Start computation... sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <382191>[FingerblastSubToolJob-382191 | 3929 (618855364076563369)] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Start computation... sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <382191>[FingerblastSubToolJob-382191 | 3929 (618855364076563369)] Start computation...

Sorry for bothering you...

Have a great day !

zglong1 commented 1 week ago

I'm also getting this error/situation, where the job seems to stop running at ~70 or ~85%, even when trying repeatedly to end it after no progress and restarting it. My log looks the same, but I've attached the mgf file I'm using, exported from MZMine. SIRIUS_MGF.zip

MartinHoffmannJena commented 6 days ago

I'm also getting this error/situation, where the job seems to stop running at ~70 or ~85%, even when trying repeatedly to end it after no progress and restarting it. My log looks the same, but I've attached the mgf file I'm using, exported from MZMine. SIRIUS_MGF.zip

Could you please double click on a running "stuck" structure job and copy the log content here? It should look like this

image

zglong1 commented 6 days ago

Aye, I will; it's currently running from start to finish, so if/when it stalls, I'll post it. The whole job's only at 11% right now though, and I'm running it locally vs our computing cluster (which had updates a few days ago and SIRIUS seems to stall indefinitely once the GUI splash window opens, we'll see if it's a SIRIUS issue or cluster issue; already reached out to cluster tech people about it), so it may be a bit.

zglong1 commented 6 days ago

Stuck at 63% this time, on the structure job. Same mgf file being processed as the one I linked earlier. Stuck

It's in the picture, but here's the content of the topmost feature's log; it looks the same for all the other features (based on a random sampling).

Here's the the full log: Full log.txt

zglong1 commented 6 days ago

For reference, I'm sitting at 5-10% CPU useage, 53% memory useage, and 3-6% GPU useage as it's sitting there. This is with a Windows 10 PC with an Intel Core I7 4GHz 4-core processor, and 64GB 3.2GHz RAM, >100GB of space left on my SSD.

MartinHoffmannJena commented 6 days ago

Stuck at 63% this time, on the structure job. Same mgf file being processed as the one I linked earlier. Stuck

It's in the picture, but here's the content of the topmost feature's log; it looks the same for all the other features (based on a random sampling).

Here's the the full log: Full log.txt

Thank you, this is kinda tricky since I cannot reproduce it at all (I ran the same file multiple time and with different settings and it never stalls). From your screenshot it seems to be that one of the stalling feature has the ID "772", could you please reload the data and only compute that feature?

It'd be interesting to see if it stalls on that feature specifically or only if multiple features are computed at the same time.

Additionally, do you have that issue with every dataset, or just specific ones?

Do-rossi commented 6 days ago

Hi,

For me it's with every dataset. As you said I tried to compute only one feature that was stuck around 20% (3793) and it worked well.

Capture d’écran 2024-09-19 à 10 06 23
zglong1 commented 6 days ago

Ugh, meanwhile mine does get stuck if I do the individual feature, but I know this isn't always the case, because this has happened to me across multiple datasets (well, different MGF files from similar-ish datasets).

To be clear, when I cancel the stalled big job, and try to continue with just feature 772, I'm processing only 772, and selecting only compound class and structure search modules for it (no Novelist, at least for this test), and doing so again leads it to stall at 20% on the structure portion, 87% overall. I did not have recompute on.

Turning on recompute when doing the single job doesn't help, it stalls at 20% on the fingerprint step.

Recalculating the entire feature from scratch, starting at formulas, also doesn't work. It stalls at 2% on the spectra-search step with the attached settings.

Settings

MartinHoffmannJena commented 6 days ago

Hi,

For me it's with every dataset. As you said I tried to compute only one feature that was stuck around 20% (3793) and it worked well.

Capture d’écran 2024-09-19 à 10 06 23

Ugh, meanwhile mine does get stuck if I do the individual feature, but I know this isn't always the case, because this has happened to me across multiple datasets (well, different MGF files from similar-ish datasets).

To be clear, when I cancel the stalled big job, and try to continue with just feature 772, I'm processing only 772, and selecting only compound class and structure search modules for it (no Novelist, at least for this test), and doing so again leads it to stall at 20% on the structure portion, 87% overall. I did not have recompute on.

Turning on recompute when doing the single job doesn't help, it stalls at 20% on the fingerprint step.

Recalculating the entire feature from scratch, starting at formulas, also doesn't work. It stalls at 2% on the spectra-search step with the attached settings.

Settings

What happens if you input the mgf into a new SIRIUS project and then

a) only compute feature 772 with default parameters (formula, predict, structure)? b) only compute feature 772 with the parameters you showed above (formula, predict, structure)?

(Please create a new project for a and b respectively)

zglong1 commented 6 days ago

I restarted SIRIUS before doing A and B, just to keep things consistent.

A) With completely default settings, it finished in about 10 seconds.

B) With the settings I posted originally, it also finished; took a bit longer, maybe 15 seconds?

But, in both cases, it worked fine. So strange.

MartinHoffmannJena commented 6 days ago

Okay, so it is probably only happening under load, now I'd like to understand if this is a GUI issue or a workflow issue. Could you run the .mgf in it's entirety again, but this time use the CLI with:

a) default GUI parameters b) your parameters

(restart SIRIUS and new project inbetween)

You can use the "show command" button to get a CLI command that corresponds to your GUI parameters

zglong1 commented 6 days ago

Scary, I have a deep-seated fear of command line, but I guess now's the time to get over it and learn how to use it. Should make my using SIRIUS on our computing cluster more productive without having to run it with the GUI.

I'll try to get to doing this sometime today and get back to you!

MartinHoffmannJena commented 6 days ago

EDIT: Please hold off on doing this until 6.0.6 is released (today or tomorrow)

Let me know if you need help, what you need to do is the following:

  1. Open the SIRIUS GUI, load in your .mgf and set your parameters like you usually would

  2. Instead of clicking "compute", click "show command" instead, copy the contents of the clipboard and paste it into some text editor

  3. Open a command prompt then type:

sirius --input C:\Users\Username\Documents\mgfName.mgf --project C:\Users\Username\Documents\siriusProjectName.sirius

and then paste the command after that. The whole thing should look like this (a bit different if you choose different parameters obviously):

image

zglong1 commented 5 days ago

Ah, saw your edit. I'll go again once 6.0.6 is released, but right now I'm on my laptop and was curious if it will work on this machine vs my home PC. This one's running Windows 11, Intel i9 11900H at 2.5 GHz processor (8 cores), and 32 GB of RAM.

I did a fresh install of SIRIUS 6.0.5 and used the same MGF file and my normal settings (as above, with a 60 second timeout per compound). It finished completely in 3hr 17 minutes. It didn't get stuck at any step.