sirius-ms / sirius

SIRIUS is a software for discovering a landscape of de-novo identification of metabolites using tandem mass spectrometry. This repository contains the code of the SIRIUS Software (GUI and CLI)
GNU Affero General Public License v3.0
84 stars 20 forks source link

(v4.9.8 and v4.9.12) - "Error when try to connect to Server" and "Time out" errors #56

Closed Tomnl closed 2 years ago

Tomnl commented 2 years ago

Hi SIRIUS team,

I sent an email regarding this to sirius at uni-jena.de address but realised this issue might be better reported here.

Do you have any usages limits on the web-api put in place? As I have been running into some issues with SIRIUS CLI either taking a long time to run and getting Server Timeouts with both version 4.9.8 and version 4.9.12.

See below for example command:

sirius --cores 1 --no-citations --ms2 test_spectra.txt --adduct '[M+H]+' --precursor 579.273473765502 -o output_for_email formula -c 5 --ppm-max 10 --profile orbitrap structure --database BIO Canopus

I have tested on macOS v12.1 (using my local computer) and Linux (centOS v8 - running on our University infrastructure) and get the same time out messages.

test_spectra.txt

52.6695442199707 1078.53381347656 54.6258087158203 1203.5361328125 54.716609954834 1108.81457519531 55.0658187866211 1009.50543212891 60.0449600219727 11070.189453125 61.550910949707 1411.07067871094 64.1043319702148 1254.43505859375 65.4877395629883 1168.84936523438 66.881950378418 1241.5302734375 72.0811996459961 8653.619140625 77.9728622436523 1266.32629394531 99.1635437011719 1364.78369140625 108.299331665039 1283.79638671875 117.066223144531 2974.04760742188 127.050765991211 1757.62414550781 142.841293334961 1367.48571777344 145.061096191406 12366.595703125 153.411758422852 1464.02978515625 157.061492919922 1764.42504882812 174.295623779297 1549.10290527344 184.07243347168 1581.74401855469 186.088012695312 1955.97741699219 196.072158813477 3160.29174804688 199.108627319336 1752.86364746094 214.082778930664 10127.05078125 278.813903808594 1422.2705078125 350.426147460938 1574.53576660156 369.708038330078 1583.81530761719 427.158233642578 1949.4599609375 525.436279296875 1544.3994140625

SIRIUS v4.9.8

sirius --cores 1 --no-citations --ms2 test_spectra.txt --adduct '[M+H]+' --precursor 579.273473765502 -o output_for_email formula -c 5 --ppm-max 10 --profile orbitrap structure --database BIO canopus INFO 11:55:32 - Sirius Workspace Successfull initialized at: /Users/tomnl/.sirius-4.9 INFO 11:55:32 - You run SIRIUS 4.9.8 INFO 11:55:32 - Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3 INFO 11:55:32 - Treebuilder priorities loaded from 'sirius.properties' are: [GUROBI, CPLEX, CLP, GLPK] INFO 11:55:32 - CPU check done. 2 cores that handle 4 threads were found. INFO 11:55:32 - Bug reporter initialized. INFO 11:55:32 - Web API initialized. INFO 11:55:32 - Running with following arguments: [--cores, 1, --no-citations, --ms2, test_for_email.txt, --adduct, [M+H]+, --precursor, 579.273473765502, -o, output_for_email, formula, -c, 5, --ppm-max, 10, --profile, orbitrap, structure, --database, BIO, canopus] WARNING 11:55:36 - Could not load GrbSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: gurobi/GRBException WARNING 11:55:36 - Could not load CPLEXSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: ilog/concert/IloNumVar WARNING 12:55:59 - Fingerblast Job 'FingerblastJJob' skipped because of result was null. WARNING 12:55:59 - <6>[FingeridSubToolJob | 0unknown@579m/z] ToolChain Job canceled due to: de.unijena.bioinf.jjobs.exceptions.TimeoutException: Prediction canceled by client timeout. A timout of "3600000ms" was reached. WARNING 12:55:59 - <7>[CanopusSubToolJob | ] ToolChain Job canceled due to: java.lang.InterruptedException: Job canceled before submission to executor, but Interruption does not happen for some reason!? INFO 12:55:59 - Workflow has been finished! INFO 12:55:59 - Executing Postprocessing... INFO 12:55:59 - Writing summary files... INFO 12:55:59 - Project-Space summaries successfully written! INFO 12:55:59 - CLI shut down hook: SIRIUS is cleaning up threads and shuts down...

SIRIUS v4.9.12

sirius --cores 1 --no-citations --ms2 test_for_email.txt --adduct '[M+H]+' --precursor 579.273473765502 -o output_for_email2 formula -c 5 --ppm-max 10 --profile orbitrap structure --database BIO canopus Feb 11, 2022 1:08:27 PM org.apache.commons.beanutils.FluentPropertyBeanIntrospector introspect INFO: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property. INFO 13:08:27 - Sirius Workspace Successfull initialized at: /Users/tomnl/.sirius-4.9 INFO 13:08:27 - You run SIRIUS 4.9.12 INFO 13:08:28 - Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3 INFO 13:08:28 - Treebuilder priorities loaded from 'sirius.properties' are: [GUROBI, CPLEX, CLP, GLPK] INFO 13:08:28 - CPU check done. 2 cores that handle 4 threads were found. INFO 13:08:28 - Bug reporter initialized. INFO 13:08:28 - Web API initialized. INFO 13:08:28 - Running with following arguments: [--cores, 1, --no-citations, --ms2, test_for_email.txt, --adduct, [M+H]+, --precursor, 579.273473765502, -o, output_for_email2, formula, -c, 5, --ppm-max, 10, --profile, orbitrap, structure, --database, BIO, canopus] WARNING 13:08:33 - Could not load GrbSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: gurobi/GRBException WARNING 13:08:33 - Could not load CPLEXSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: ilog/concert/IloNumVar WARNING 13:15:58 - Error when try to connect to Server. Try again in 4.0s Cause: Connect to www.csi-fingerid.uni-jena.de:443 [www.csi-fingerid.uni-jena.de/141.35.34.12] failed: Connect timed out WARNING 13:16:17 - Error when try to connect to Server. Try again in 8.0s Cause: Connect to www.csi-fingerid.uni-jena.de:443 [www.csi-fingerid.uni-jena.de/141.35.34.12] failed: Connect timed out WARNING 13:16:40 - Error when try to connect to Server. Try again in 16.0s Cause: Connect to www.csi-fingerid.uni-jena.de:443 [www.csi-fingerid.uni-jena.de/141.35.34.12] failed: Connect timed out WARNING 13:17:06 - Error when try to connect to Server. Try again in 32.0s Cause: Connect to www.csi-fingerid.uni-jena.de:443 [www.csi-fingerid.uni-jena.de/141.35.34.12] failed: Operation timed out WARNING 13:17:53 - Error when try to connect to Server. Try again in 64.0s Cause: Connect to www.csi-fingerid.uni-jena.de:443 [www.csi-fingerid.uni-jena.de/141.35.34.12] failed: Read timed out WARNING 13:19:43 - Error when try to connect to Server. Try again in 4.0s Cause: Connect to www.csi-fingerid.uni-jena.de:443 [www.csi-fingerid.uni-jena.de/141.35.34.12] failed: Read timed out WARNING 13:20:02 - Error when try to connect to Server. Try again in 8.0s Cause: Connect to www.csi-fingerid.uni-jena.de:443 [www.csi-fingerid.uni-jena.de/141.35.34.12] failed: Read timed out WARNING 13:20:25 - Error when try to connect to Server. Try again in 16.0s Cause: Connect to www.csi-fingerid.uni-jena.de:443 [www.csi-fingerid.uni-jena.de/141.35.34.12] failed: Read timed out WARNING 13:20:56 - Error when try to connect to Server. Try again in 32.0s Cause: Connect to www.csi-fingerid.uni-jena.de:443 [www.csi-fingerid.uni-jena.de/141.35.34.12] failed: Read timed out WARNING 13:21:43 - Error when try to connect to Server. Try again in 64.0s Cause: Connect to www.csi-fingerid.uni-jena.de:443 [www.csi-fingerid.uni-jena.de/141.35.34.12] failed: Read timed out

Tomnl commented 2 years ago

After reading some comments of other issues.

Can this be caused if using the same Workspace (e.g. /Users/tomnl/.sirius-4.9) for multiple CLI calls?

(and if that is the case is it possible to define different workspaces?)

Tomnl commented 2 years ago

OK I think I have a way around the same workspace issue (by running this prior to SIRIUS export _JAVA_OPTIONS=-Duser.home=/new/path/).

However, I am still getting TimeoutExceptions. Perhaps the web-service is just very busy at the moment?

I notice on the GUI - pending on jobs on server has been >1000 for a while today

I will try again in the next few days

eeko-kon commented 2 years ago

I've been running SIRIUS and CSI for quite a lot of days (1200 jobs-10 days). I removed CSI now because it became extremely slow. Does it work for you today?

Tomnl commented 2 years ago

I tried yesterday but still had timeout issues.

I did manage to get a successful job running with a lower precursors mass though. So perhaps that provides some clue to what is going on...

I will re-run again today and update here

eeko-kon commented 2 years ago

Thanks a lot. For the past days in my case, CSI processed 1 file per 12 hours. Which is not ideal. Let me know how it goes today. Perhaps @kaibioinfo or @mfleisch can let us know if there is an issue with their servers?

Tomnl commented 2 years ago

Still the same errors at the moment e.g.

(for lower mass precursor)

INFO    12:27:43 - Sirius Workspace Successfull initialized at: /Users/tomnl/.sirius-4.9
INFO    12:27:43 - You run SIRIUS 4.9.8
INFO    12:27:43 - Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3
INFO    12:27:43 - Treebuilder priorities loaded from 'sirius.properties' are: [GUROBI, CPLEX, CLP, GLPK]
INFO    12:27:43 - CPU check done. 2 cores that handle 4 threads were found.
INFO    12:27:44 - Bug reporter initialized.
INFO    12:27:44 - Web API initialized.
INFO    12:27:44 - Running with following arguments: [--cores, 1, --no-citations, --ms2, /Users/tomnl/check/check/temp_f08ebd13-b8d6-46cc-8ae3-bfa61b4da864/1_tmpspec.txt, --adduct, [M+H]+, --precursor, 115.07535132383688, -o, output_15126, formula, -c, 5, --ppm-max, 5, --profile, orbitrap, structure, --database, BIO, canopus]
WARNING 12:27:46 - Could not load GrbSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: gurobi/GRBException
WARNING 12:27:46 - Could not load CPLEXSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: ilog/concert/IloNumVar
WARNING 13:28:03 - Fingerblast Job 'FingerblastJJob' skipped because of result was null.
WARNING 13:28:03 - <7>[CanopusSubToolJob | <Awaiting Instance>] ToolChain Job canceled due to: java.lang.InterruptedException: Job canceled before submission to executor, but Interruption does not happen for some reason!?
WARNING 13:28:03 - <6>[FingeridSubToolJob | 0_unknown_@115m/z] ToolChain Job canceled due to: de.unijena.bioinf.jjobs.exceptions.TimeoutException: Prediction canceled by client timeout. A timout of "3600000ms" was reached.
INFO    13:28:03 - Workflow has been finished!
INFO    13:28:03 - Executing Postprocessing...
INFO    13:28:03 - Writing summary files...
INFO    13:28:03 - Project-Space summaries successfully written!
INFO    13:28:03 - CLI shut down hook: SIRIUS is cleaning up threads and shuts down...
INFO    13:28:03 - Try to delete leftover jobs on web server...

(for higher mass precursor)

INFO    11:23:34 - Sirius Workspace Successfull initialized at: /Users/tomnl/.sirius-4.9
INFO    11:23:34 - You run SIRIUS 4.9.8
INFO    11:23:34 - Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3
INFO    11:23:34 - Treebuilder priorities loaded from 'sirius.properties' are: [GUROBI, CPLEX, CLP, GLPK]
INFO    11:23:35 - CPU check done. 2 cores that handle 4 threads were found.
INFO    11:23:35 - Bug reporter initialized.
INFO    11:23:35 - Web API initialized.
INFO    11:23:35 - Running with following arguments: [--cores, 1, --no-citations, --ms2, spectra_for_test.txt, --adduct, [M+H]+, --precursor, 579.273473765502, -o, output_151123, formula, -c, 5, --ppm-max, 10, --profile, orbitrap, structure, --database, BIO, canopus]
WARNING 11:23:43 - Could not load GrbSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: gurobi/GRBException
WARNING 11:23:43 - Could not load CPLEXSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: ilog/concert/IloNumVar
WARNING 12:24:02 - Fingerblast Job 'FingerblastJJob' skipped because of result was null.
WARNING 12:24:02 - <7>[CanopusSubToolJob | <Awaiting Instance>] ToolChain Job canceled due to: java.lang.InterruptedException: Job canceled before submission to executor, but Interruption does not happen for some reason!?
WARNING 12:24:02 - <6>[FingeridSubToolJob | 0_unknown_@579m/z] ToolChain Job canceled due to: de.unijena.bioinf.jjobs.exceptions.TimeoutException: Prediction canceled by client timeout. A timout of "3600000ms" was reached.
INFO    12:24:02 - Workflow has been finished!
INFO    12:24:02 - Executing Postprocessing...
INFO    12:24:03 - Writing summary files...
INFO    12:24:03 - Project-Space summaries successfully written!
INFO    12:24:03 - CLI shut down hook: SIRIUS is cleaning up threads and shuts down...
INFO    12:24:03 - Try to delete leftover jobs on web server...
eeko-kon commented 2 years ago

Ok so essentially you have exactly the same warnings for both very small and (a bit) larger features.

The Solver warnings are not a problem - the issue is that there is always a compound timeout of what it seems is 60 minutes and CSI is reaching those limits with very small compounds (both m/z115 or 579 should not be an issue to compute in less than 10 min) so the job gets canceled. If you open the sirius GUI you actually get the following warning:

Screenshot 2022-02-15 at 15 59 58

So let's try in a few days.

Tomnl commented 2 years ago

Ok - thanks for sharing that. The GUI warning is pretty informative.

Yes lets try over the next few days and see if the worker instances become available

Tomnl commented 2 years ago

This seems fixed to me now.

How about you @eeko-kon ?

eeko-kon commented 2 years ago

Yes, it does! Thanks a lot for letting me know.

mfleisch commented 2 years ago

Hey, we had a faulty disk in one node of our cluster. It unfortunately did not cause the service to crash but rather prevented jobs to be successfully delivered to the workers nodes. So it took us I while to notice this, since everything looked fine from logging and monitoring perspective. The service should be healthy again.

Thanks a lot for sharing all the details!