sirius-ms / sirius

SIRIUS is a software for discovering a landscape of de-novo identification of metabolites using tandem mass spectrometry. This repository contains the code of the SIRIUS Software (GUI and CLI)
GNU Affero General Public License v3.0
88 stars 23 forks source link

Sirius issues #43

Closed eeko-kon closed 3 years ago

eeko-kon commented 3 years ago

Hi! I sent an email about this, sorry for the spam but just wondering if you have any updates on this issue:

I am using PyOpenMS on MacOS (10.15.7) to call the SIRIUS CLI executable (4.6.2 - OpenMS THIRDPARTY) [1]. I am getting the following error:

SEVERE10:20:40 - Error when loading confidence SVMs or the bayesian network. Confidence SCore will not be available!

siriusCLI.zip SIRIUS is stuck and not terminating the program afterwards.

This error occurs when calling the SIRIUS executable via the OpenMS python bindings, as well as when we use the SIRIUS CLI directly:

/Users/eeko/Desktop/software/THIRDPARTY/MacOS/64bit/Sirius/sirius --input /Users/eeko/Desktop/siriusCLI/sirius.ms --project /Users/eeko/Desktop/siriusCLI/sirius_out sirius

Strangely, it does not appear when using SIRIUS GUI (4.7.0) with the sirius.ms file. It also runs perfectly fine when using the CLI with the SiriusAdapter from OpenMS 2.6.0 nightly (2021-04-04) (it takes 20.67 s), which is very strange since it calls the SIRIUS executable from the THIRDPARTY file.

I have attached an example below.

It would be great if you could take a look!

Note: 1) I am matching all parameters in pyOpenMS and the CLI 2) I used different files to test whether it is an issue with the mzML file

[1] https://github.com/OpenMS/THIRDPARTY/tree/master/MacOS/64bit/Sirius

eeko-kon commented 3 years ago

I also downloaded and ran sirius 4.8.2 (instead of the previously used 4.6.2 version). The new executable seems to be working only with very small files (raw data of two standards) when using pyOpenMS and again gets stuck with any normal-sized file (30MB).

I have also tested the CLI with larger files using the same arguments as previously reported (see file sirius_test.sh) and besides the fact that it takes forever to finish, I am noticing for the first time the following messages:

Could not load GrbSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: gurobi/GRBException
WARNING 16:26:46 - Could not load CPLEXSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: ilog/concert/IloNumVar
WARNING 16:26:48 - <1050>[FingerIDJJob | _13060777216285853167-5573--5572--UNKNOWN@144.983060615625m/z] Ignore fragmentation tree for C3H6Cl2O2 because it contains less than 3 vertices.
WARNING 16:26:48 - <1050>[FingerIDJJob | _13060777216285853167-5573--5572--UNKNOWN@144.983060615625m/z] No suitable fragmentation tree left.
Objective and/or iteration limits were reached
WARNING 16:32:05 - <69454>[SiriusSubToolJob | 9_AGNESsiriusHRMPC__13979845427214614932-152--151--UNKNOWN@619m/z] ToolChain Job canceled due to: de.unijena.bioinf.jjobs.exceptions.TimeoutException: Timeout reached!
WARNING 16:32:05 - <69455>[FingeridSubToolJob | <Awaiting Instance>] ToolChain Job canceled due to: java.lang.InterruptedException: Job canceled before submission to executor, but Interruption does not happen for some reason!?
WARNING 16:32:06 - <72206>[FasterTreeComputationInstance] ExactJob: Score of C11H8N8O2 differs significantly from recalculated score: 46.650356741912844 vs 37.94947004695127 with tree size is 2.5 and root score is 0.003965981983680589 and 2.5 sort key is score 46.650356741912844 and filename is /Users/eeko/Desktop/siriusCLI/AGNESsiriusHRMPC.ms

..and it keeps producing these warnings forever. I suspect that this is the reason why when using pyOpenMS, it is "stuck".

P.S. In the file sirius_test.sh you will notice that I am using several parameters (the ones I am also using in the pyOpenMS workflow). I have tested the CLI without those parameters and it is still generating the above errors.

Weirdly, (again) when I am using SiriusAdapter from OpenMS pre nightly 2.6.0 (2021-04-04) it is running absolutely fine.

Let me know if you have some time to look into it or any tips 😊. I would appreciate it a lot!

f-kretschmer commented 3 years ago

Hi, Testing the input files on the 4.8.2 CLI Mac version, on my machine it does take a long time but finishes after about one hour. Apparently 3 compounds seem to be taking long and thus preventing the workflow from finishing:

Could you test if SIRIUS still gets stuck when removing/cancelling these compounds? When cancelling the first of these on my machine, the worfklow finishes in a couple of minutes. Perhaps if you can reproduce this, we can narrow down what's going wrong :)

eeko-kon commented 3 years ago

Hi Fleming,

The issue is that I tested those files (this and more) a month or so ago and it worked fine. I can definitely try, but there are other files that have the same issue. Give me a few days and I will get back to you. :)

BW Efi.

eeko-kon commented 3 years ago

Dear f,

I had a very busy week and just sat down to check what is going on. How can I remove those compounds and test SIRIUS again?

I am trying with other files and I have the same issue though.

Thanks a lot Efi

f-kretschmer commented 3 years ago

Hi Efi,

If you're using the GUI, after importing your files, you can select everything except these three compounds (ctrl+A works here, unselecting with ctrl-click) and start the computation with right-click->compute -- or "compute all" and cancel the corresponding jobs.

I am not sure whether I understand the issue correctly: does sirius running through pyOpenMS really get stuck, i.e. not finish at all? Or does it finish after a long time (1-2 hours)? If it is the second one, it might be because of the three compounds listed above, computing all compounds except these should only take a few minutes (perhaps up to half an hour depending on your machine).

The warning messages in the output you posted should not be part of the problem (they just contain the info that a task was cancelled due to the timeout values you set, --compound-timeout).

It is also possible that when you ran sirius 4.7.0 and everthing finished in 20s, sirius just used the computation results from a previous successful run (you can set the "recompute already computed" checkbox in the "compute all" dialog).

I hope we can find and fix the problem.

Fleming

eeko-kon commented 3 years ago

Dear Fleming,

I am only trying to use Sirius from the pyOpenMS wrapper or the CLI (the same thing basically, calling sirius executable version 4.8.2).

The GUI actually works fine. Is it a different version?

Efi.

eeko-kon commented 3 years ago

I am not sure whether I understand the issue correctly: does sirius running through pyOpenMS really get stuck, i.e. not finish at all? Or does it finish after a long time (1-2 hours)? If it is the second one, it might be because of the three compounds listed above, computing all compounds except these should only take a few minutes (perhaps up to half an hour depending on your machine).

I haven't left it running for so long, to be honest. It normally finishes in a few seconds, so when it is "stuck" running the executable for over 30 min, I stop the run.

eeko-kon commented 3 years ago

To add to that, also the people at GNPS are experiencing issues with the workflows that include sirius :/ They just haven't found the time to contact you. I made a run with 80 files (processed with mzmine) and then ran the files with the sirius workflow that's offered in GNPS and it has been almost 6 hours but the run hasn't finished. I think the same thing happens with Qemistree. Let me know if you have time to talk about it a bit more thoroughly through zoom. Thanks a lot :)

eeko-kon commented 3 years ago

(however 80 files might be a lot :) )

eeko-kon commented 3 years ago

Follow up: In GNPS the job was done after a bit more than 15h

mfleisch commented 3 years ago

We had to issues here: 1. How to deal with high mass compounds in high-throughput analysis: I wrote a few lines on that in our documentation

2. Much slower computation when using the CLI compared to the GUI: With version 4.6 we introduced the pre-computation of probabilistic tanimoto similarities in the CLI to allow faster rendering of such pre-computed datasets in the GUI. However for large analysis with large candidate lists, e.g. when searching with --db=ALL this dramatically increased the running time. When using the GUI the BIO db, which contains much less candidate structures, the tanimoto computation does not make a large difference.

@eeko-kon In the OpenMS SiriusAdapter --db=ALL is the default, that is why the GUI (--db=BIO) was much faster.

Fix: We changed the tanimoto algorithm to a binary version by rounding the probabilistic fingerprint. This is much faster and does no longer noticeably effect the running time. This change will be included in the upcoming release.

mfleisch commented 3 years ago

fixed with version 4.9.0