Job stuck Sirius 6.05 - Githubissues

Do-rossi commented 2 months ago

Hi,

I have an issue using Sirius 6.05 on MacOS with negative ms/ms data. The main job stopped around 70% when i run my data using Sirius, Canopus and Database search. Some of the jobs are stuck at 70% or 20%. I tried several times to cancel all and then compute using only Sirius but it doesn’t look better. I tried to include/exclude High mass compound without any change and with or without Zodiac but it's the same results. My log also says “Invalidate existing Results and Recompute” and then start a computation that never ends.

"sept. 09, 2024 10:02:07 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <740>[BackgroundRunJob-740] Invalidate existing Results and Recompute! sept. 09, 2024 10:02:07 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <740>[BackgroundRunJob-740] Start computation..."

I also figured out that even if I choose adducts like [M+HCOOH−H]−, only [M−H]− are found in my data. (I know that there is [M+HCOOH−H]− adducts)

I'm maybe doing something wrong, If you have any advice let me know.

MartinHoffmannJena commented 2 months ago

Hi,

Are you able to share the data that produces this state (together with the set of parameters used to start the computation)?

Do-rossi commented 2 months ago

Hi,

This is the mgf file obtained after Mzmine. Sirius.mgf.zip I used the GUI and didn't change the default parameters, except that I also choose HCOOH adducts in fallbacks adducts and a custom database search.

config spectra-search --FormulaSearchSettings.applyFormulaConstraintsToBottomUp=false --IsotopeSettings.filter=true --UseHeuristic.useOnlyHeuristicAboveMz=650 --FormulaSearchDB=, --Timeout.secondsPerTree=0 --FormulaSettings.enforced=H,C,N,O,P --Timeout.secondsPerInstance=0 --AlgorithmProfile=qtof --SpectralMatchingMassDeviation.allowedPeakDeviation=10.0ppm --AdductSettings.enforced=, --AdductSettings.prioritizeInputFileAdducts=true --UseHeuristic.useHeuristicAboveMz=300 --IsotopeMs2Settings=IGNORE --MS2MassDeviation.allowedMassDeviation=10.0ppm --SpectralMatchingMassDeviation.allowedPrecursorDeviation=10.0ppm --FormulaSearchSettings.performDeNovoBelowMz=400.0 --FormulaSearchSettings.applyFormulaConstraintsToDatabaseCandidates=false --EnforceElGordoFormula=true --NumberOfCandidatesPerIonization=1 --FormulaSettings.detectable=B,S,Cl,Se,Br --NumberOfCandidates=10 --AdductSettings.fallback=[[M-H]-,[M+CH2O2-H]-] --FormulaSearchSettings.performBottomUpAboveMz=0 --FormulaResultThreshold=true --ExpansiveSearchConfidenceMode.confidenceScoreSimilarityMode=APPROXIMATE --StructureSearchDB=ester_lib,METACYC,BloodExposome,CHEBI,COCONUT,FooDB,GNPS,HMDB,HSDB,KEGG,KNAPSACK,LOTUS,LIPIDMAPS,MACONDA,MESH,MiMeDB,NORMAN,PLANTCYC,PUBCHEMANNOTATIONBIO,PUBCHEMANNOTATIONDRUG,PUBCHEMANNOTATIONFOOD,PUBCHEMANNOTATIONSAFETYANDTOXIC,SUPERNATURAL,TeroMol,YMDB --RecomputeResults=false formulas fingerprints classes structures

Thank you for your help

MartinHoffmannJena commented 2 months ago

Hi,

I'm not able to reproduce this, I don't get any stuck jobs and I also do see [M+HCOOH−H] results:

Can you run it again and check the CPU/Memory load on your computer while you run it?

Do-rossi commented 2 months ago

Hi,

I recomputed all tasks and now it's stuck at 73%. My CPU and memory was like this when I started the computation: Capture d’écran 2024-09-18 à 09 43 24

And now that it's stuck, it's like that: Capture d’écran 2024-09-18 à 12 29 21

Also a part of my log :

INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <1995>[WebJobWatcherJJob-1995] No prediction jobs finished. Waiting before retry 1.0s sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] DONE! sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381946>[CanopusSubToolJob-381946 | 3964 (618855364793789458)] DONE! sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381947>[FingerblastSubToolJob-381947 | 3964 (618855364793789458)] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Start computation... sept. 18, 2024 11:14:10 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381947>[FingerblastSubToolJob-381947 | 3964 (618855364793789458)] Start computation... sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] DONE! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381970>[CanopusSubToolJob-381970 | 3954 (618855364516965359)] DONE! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] DONE! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <382190>[CanopusSubToolJob-382190 | 3929 (618855364076563369)] DONE! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381971>[FingerblastSubToolJob-381971 | 3954 (618855364516965359)] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Start computation... sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <381971>[FingerblastSubToolJob-381971 | 3954 (618855364516965359)] Start computation... sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <382191>[FingerblastSubToolJob-382191 | 3929 (618855364076563369)] Invalidate existing Results and Recompute! sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <97>[BackgroundRunJob-97] Start computation... sept. 18, 2024 11:14:11 AM de.unijena.bioinf.jjobs.JJob lambda$logInfo$9 INFOS: <382191>[FingerblastSubToolJob-382191 | 3929 (618855364076563369)] Start computation...

Sorry for bothering you...

Have a great day !

zglong1 commented 2 months ago

I'm also getting this error/situation, where the job seems to stop running at ~70 or ~85%, even when trying repeatedly to end it after no progress and restarting it. My log looks the same, but I've attached the mgf file I'm using, exported from MZMine. SIRIUS_MGF.zip

MartinHoffmannJena commented 2 months ago

I'm also getting this error/situation, where the job seems to stop running at ~70 or ~85%, even when trying repeatedly to end it after no progress and restarting it. My log looks the same, but I've attached the mgf file I'm using, exported from MZMine. SIRIUS_MGF.zip

Could you please double click on a running "stuck" structure job and copy the log content here? It should look like this

zglong1 commented 2 months ago

Aye, I will; it's currently running from start to finish, so if/when it stalls, I'll post it. The whole job's only at 11% right now though, and I'm running it locally vs our computing cluster (which had updates a few days ago and SIRIUS seems to stall indefinitely once the GUI splash window opens, we'll see if it's a SIRIUS issue or cluster issue; already reached out to cluster tech people about it), so it may be a bit.

zglong1 commented 2 months ago

Stuck at 63% this time, on the structure job. Same mgf file being processed as the one I linked earlier. Stuck

It's in the picture, but here's the content of the topmost feature's log; it looks the same for all the other features (based on a random sampling).

Here's the the full log: Full log.txt

zglong1 commented 2 months ago

For reference, I'm sitting at 5-10% CPU useage, 53% memory useage, and 3-6% GPU useage as it's sitting there. This is with a Windows 10 PC with an Intel Core I7 4GHz 4-core processor, and 64GB 3.2GHz RAM, >100GB of space left on my SSD.

MartinHoffmannJena commented 2 months ago

Stuck at 63% this time, on the structure job. Same mgf file being processed as the one I linked earlier.

It's in the picture, but here's the content of the topmost feature's log; it looks the same for all the other features (based on a random sampling).

Here's the the full log: Full log.txt

Thank you, this is kinda tricky since I cannot reproduce it at all (I ran the same file multiple time and with different settings and it never stalls). From your screenshot it seems to be that one of the stalling feature has the ID "772", could you please reload the data and only compute that feature?

It'd be interesting to see if it stalls on that feature specifically or only if multiple features are computed at the same time.

Additionally, do you have that issue with every dataset, or just specific ones?

Do-rossi commented 2 months ago

Hi,

For me it's with every dataset. As you said I tried to compute only one feature that was stuck around 20% (3793) and it worked well.

zglong1 commented 2 months ago

Ugh, meanwhile mine does get stuck if I do the individual feature, but I know this isn't always the case, because this has happened to me across multiple datasets (well, different MGF files from similar-ish datasets).

To be clear, when I cancel the stalled big job, and try to continue with just feature 772, I'm processing only 772, and selecting only compound class and structure search modules for it (no Novelist, at least for this test), and doing so again leads it to stall at 20% on the structure portion, 87% overall. I did not have recompute on.

Turning on recompute when doing the single job doesn't help, it stalls at 20% on the fingerprint step.

Recalculating the entire feature from scratch, starting at formulas, also doesn't work. It stalls at 2% on the spectra-search step with the attached settings.

Settings

MartinHoffmannJena commented 2 months ago

Hi,

For me it's with every dataset. As you said I tried to compute only one feature that was stuck around 20% (3793) and it worked well.

Ugh, meanwhile mine does get stuck if I do the individual feature, but I know this isn't always the case, because this has happened to me across multiple datasets (well, different MGF files from similar-ish datasets).

To be clear, when I cancel the stalled big job, and try to continue with just feature 772, I'm processing only 772, and selecting only compound class and structure search modules for it (no Novelist, at least for this test), and doing so again leads it to stall at 20% on the structure portion, 87% overall. I did not have recompute on.

Turning on recompute when doing the single job doesn't help, it stalls at 20% on the fingerprint step.

Recalculating the entire feature from scratch, starting at formulas, also doesn't work. It stalls at 2% on the spectra-search step with the attached settings.

What happens if you input the mgf into a new SIRIUS project and then

a) only compute feature 772 with default parameters (formula, predict, structure)? b) only compute feature 772 with the parameters you showed above (formula, predict, structure)?

(Please create a new project for a and b respectively)

zglong1 commented 2 months ago

I restarted SIRIUS before doing A and B, just to keep things consistent.

A) With completely default settings, it finished in about 10 seconds.

B) With the settings I posted originally, it also finished; took a bit longer, maybe 15 seconds?

But, in both cases, it worked fine. So strange.

MartinHoffmannJena commented 2 months ago

Okay, so it is probably only happening under load, now I'd like to understand if this is a GUI issue or a workflow issue. Could you run the .mgf in it's entirety again, but this time use the CLI with:

a) default GUI parameters b) your parameters

(restart SIRIUS and new project inbetween)

You can use the "show command" button to get a CLI command that corresponds to your GUI parameters

zglong1 commented 2 months ago

Scary, I have a deep-seated fear of command line, but I guess now's the time to get over it and learn how to use it. Should make my using SIRIUS on our computing cluster more productive without having to run it with the GUI.

I'll try to get to doing this sometime today and get back to you!

MartinHoffmannJena commented 2 months ago

EDIT: Please hold off on doing this until 6.0.6 is released (today or tomorrow)

Let me know if you need help, what you need to do is the following:

Open the SIRIUS GUI, load in your .mgf and set your parameters like you usually would
Instead of clicking "compute", click "show command" instead, copy the contents of the clipboard and paste it into some text editor
Open a command prompt then type:

sirius --input C:\Users\Username\Documents\mgfName.mgf --project C:\Users\Username\Documents\siriusProjectName.sirius

and then paste the command after that. The whole thing should look like this (a bit different if you choose different parameters obviously):

zglong1 commented 2 months ago

Ah, saw your edit. I'll go again once 6.0.6 is released, but right now I'm on my laptop and was curious if it will work on this machine vs my home PC. This one's running Windows 11, Intel i9 11900H at 2.5 GHz processor (8 cores), and 32 GB of RAM.

I did a fresh install of SIRIUS 6.0.5 and used the same MGF file and my normal settings (as above, with a 60 second timeout per compound). It finished completely in 3hr 17 minutes. It didn't get stuck at any step.

zglong1 commented 1 month ago

So it's been a bit, and I got a new computer, but here's an update using 6.0.6:

With my old computer, despite my initial success, I ended running into the same stalling issues when using the GUI.

With the new computer (Windows 11 Pro, AMD Ryzen 9 9950X 16-core 4.3 GHz processor, 192 GB DDR5 RAM), running in the GUI, I still run into stalling issues.

HOWEVER, with an n of 1, using the command line version, the job completed without any issues. I'm currently running a second job to check consistency.

Do-rossi commented 1 month ago

EDIT: Please hold off on doing this until 6.0.6 is released (today or tomorrow)

Let me know if you need help, what you need to do is the following:

Open the SIRIUS GUI, load in your .mgf and set your parameters like you usually would

Instead of clicking "compute", click "show command" instead, copy the contents of the clipboard and paste it into some text editor

Open a command prompt then type:

sirius --input C:\Users\Username\Documents\mgfName.mgf --project C:\Users\Username\Documents\siriusProjectName.sirius

and then paste the command after that. The whole thing should look like this (a bit different if you choose different parameters obviously):

Hi, I tried to use sirius using the command line tool but I was not able to get something. It says "zsh: no matches found: --AdductSettings.fallback=[[M-H]-,[M+CHO2+H]-,[M+CH2O2-H]-]", even if I added sirius to the PATH and sirius --help worked correctly.

When I try to paste the command like you said, it doesn't work: Capture d’écran 2024-10-21 à 13 04 25

Maybe I'm doing something wrong... Thank you for your help !

sirius-ms / sirius

Job stuck Sirius 6.05 #200