zmahnoor14 / MAW

Metabolome Annotation Workflow
24 stars 8 forks source link

Think I can't connect to Sirius because of the new 5.0 version #33

Open jsaintvanne opened 1 year ago

jsaintvanne commented 1 year ago

Hi,

I would like to try your tool and workflow but when I launch your data test I obtain a lot of informations but the following also :

WARNING 10:47:05 - 4: Cannot parse retention time: 'NAs'
WARNING 10:47:05 - Could not load GrbSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: gurobi/GRBException
WARNING 10:47:05 - Could not load CPLEXSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: ilog/concert/IloNumVar
WARNING 10:47:07 - Error when try to connect to Server. Try again in 4.0s 
 Cause: Connection reset
WARNING 10:47:08 - Error when try to connect to Server. Try again in 4.0s 
 Cause: Error when querying REST service. Bad Response Code: 404 | Message: Not Found| Content: <html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

And after it restart trying to connect in loop and never finish.... Think that Sirius closed all connection with 4.9 version...

Can someone help me please ?

zmahnoor14 commented 1 year ago

Hello,

Yes the error is due to SIRIUS4 and the webservices linked to SIRIUS4 are not supported anymore. We are currently working on adding the latest SIRIUS version (in a non-interactive way), however, the SIRIUS5 integration might take some time.

Could you let me know if you are using MAW to run your dataset and how big the size of your whole dataset (all .mzML files)? I can help with using the workflow interactively with SIRIUS5.

Mahnoor

jsaintvanne commented 1 year ago

Hello,

Thanks for your answer it is most clear for me now why it doesn't work.

We have to test on 4 DDA pos files : image

And 4 DDA neg files : image

It is few things but it is just to test your tool to see what it is able to do

Should you tell me how you think to do ? First part in R, then extract to run Sirius GUI then again R part for the post Sirius steps ? Thanks !

zmahnoor14 commented 1 year ago

Alright. Here are few points that you can consider and let me know:

  1. If you only plan to use SIRIUS5, you can use the MAW-R module (without the spectral database dereplication function which takes substantial time, based on the size of your file). This option will only give .ms input files for SIRIUS.

  2. Do you also want to run your data against spectral DBs (HMDB, MassBank, GNPS) as well? If yes, you would need the three databases as R objects which I can share with you. In this option you run the whole MAW-R module. At the end you will have spectral database dereplication results AND .ms files that can be used as an input to SIRIUS.

Once you perform the MAW-R module, and have .ms files; I can also write quick script to run SIRIUS for all files instead of using the GUI, but that depends on your preference. Using the script would only be possible if you already have SIRIUS CLI installed in your system and you have already logged in using your credentials in terminal (also which OS you are using)?

Also, the workflow for now takes only one file (we are working on a different route for parallelisation), so you can only run one file at a time for now.

My next question would be: Would you like a docker container for this task?

Let me know your thoughts here or write to me on mahnoor.zulfiqar@uni-jena.de

Hope this information helps.

Kind regards, Mahnoor

LiZhihua1982 commented 1 year ago

Dear Mahnoor, I also meet this problem. "I can also write quick script to run SIRIUS for all files instead of using the GUI......" It is very useful! Thank you very much!

Best regards

Li Zhihua

LiZhihua1982 commented 1 year ago

Hi I meet another maybe similar reason problem as below, The following object is masked from ‘package:readr’:

parse_date

Error in unserialize(node$con) : MultisessionFuture () failed to receive results from cluster RichSOCKnode #1 (PID 105 on ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: The total size of the 31 globals exported is 1.63 MiB. The three largest globals are ‘spec_dereplication_file’ (479.83 KiB of class ‘function’), ‘order’ (358.62 KiB of class ‘function’) and ‘sirius_param’ (114.21 KiB of class ‘function’) Calls: future ... resolved -> resolved.ClusterFuture -> receiveMessageFromWorker Execution halted

LiZhihua1982 commented 1 year ago

Hi I have not found the file spectral_results_for_xxxx.csv in the folder spectral_dereplication root@ed67702f821e:/opt/workdir/data/HY1/spectral_dereplication# ls GNPS HMDB MassBank

zmahnoor14 commented 1 year ago

Hello,

Please refer to the updated sections in the "provenance" branch README.md.

README.md link from provenance branch

I have mentioned SIRIUS5 as a separate section from MAW-R and following the steps to use SIRIUS5 should work now. It is important to make changes in the parameters to the function run_sirius according to your data. The function also only takes results from MAW-R which should be run previously to have .ms input files in the directory /file_Name/insilico/SIRIUS. And a list of the .ms files and their corresponding .json output files are written in a file called /file_Name/insilico/MS1DATA_SiriusP.tsv.

I hope this is clear enough to run SIRIUS5. Please let me know when you encounter any further issue.

zmahnoor14 commented 1 year ago

Hi I have not found the file spectral_results_for_xxxx.csv in the folder spectral_dereplication root@ed67702f821e:/opt/workdir/data/HY1/spectral_dereplication# ls GNPS HMDB MassBank

I would assume that the function wasn't finished because this file is generated after the function is finished. Did you encounter any error message or do you think the function was interrupted?

zmahnoor14 commented 1 year ago

Hi I meet another maybe similar reason problem as below, The following object is masked from ‘package:readr’:

parse_date

Error in unserialize(node$con) : MultisessionFuture () failed to receive results from cluster RichSOCKnode #1 (PID 105 on ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: The total size of the 31 globals exported is 1.63 MiB. The three largest globals are ‘spec_dereplication_file’ (479.83 KiB of class ‘function’), ‘order’ (358.62 KiB of class ‘function’) and ‘sirius_param’ (114.21 KiB of class ‘function’) Calls: future ... resolved -> resolved.ClusterFuture -> receiveMessageFromWorker Execution halted

If you re run the workflow, do you still get this error? Generally it should work, as it is an error from the parallelisation of the workflow. If you re run and the error persists, let me know.

LiZhihua1982 commented 1 year ago

Hi, Yes I also this errors both in MAW-r:1.0.0 and NAW-r:1.0.1 root@9e99c13fa386:/opt/workdir# Rscript --no-save --no-restore --verbose Workflow_R_Script.r >outputFile.txt running '/usr/local/lib/R/bin/R --no-echo --no-restore --no-save --no-restore --file=Workflow_R_Script.r'

Loading required package: foreach Loading required package: iterators Loading required package: S4Vectors Loading required package: stats4 Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:future’:

values

The following objects are masked from ‘package:base’:

expand.grid, I, unname

Loading required package: BiocParallel Loading required package: ProtGenerics

Attaching package: ‘ProtGenerics’

The following object is masked from ‘package:stats’:

smooth

Attaching package: ‘Spectra’

The following object is masked from ‘package:ProtGenerics’:

addProcessing

Attaching package: ‘MsCoreUtils’

The following objects are masked from ‘package:Spectra’:

bin, smooth

The following objects are masked from ‘package:ProtGenerics’:

bin, smooth

The following object is masked from ‘package:stats’:

smooth

Attaching package: ‘dplyr’

The following object is masked from ‘package:MsCoreUtils’:

between

The following objects are masked from ‘package:S4Vectors’:

first, intersect, rename, setdiff, setequal, union

The following objects are masked from ‘package:BiocGenerics’:

combine, intersect, setdiff, union

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

Attaching package: ‘rvest’

The following object is masked from ‘package:readr’:

guess_encoding

Loading required package: Rcpp Using libcurl 7.68.0 with OpenSSL/1.1.1f

Attaching package: ‘curl’

The following object is masked from ‘package:readr’:

parse_date

Error in unserialize(node$con) : MultisessionFuture () failed to receive results from cluster RichSOCKnode #1 (PID 99 on ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: The total size of the 31 globals exported is 1.64 MiB. The three largest globals are ‘spec_dereplication_file’ (486.30 KiB of class ‘function’), ‘order’ (358.62 KiB of class ‘function’) and ‘sirius_param’ (114.21 KiB of class ‘function’) Calls: future ... resolved -> resolved.ClusterFuture -> receiveMessageFromWorker Execution halted

LiZhihua1982 commented 1 year ago

Dear Mahnoor, Maybe this link http://gforge.se/2015/02/how-to-go-parallel-in-r-basics-tips/ will be useful for fixing this problem. Thanks.

LiZhihua1982 commented 1 year ago

https://search.r-project.org/CRAN/refmans/future/html/plan.html

zmahnoor14 commented 1 year ago

Sorry for the delayed response here, I am currently trying to fix all these issues with CWL, but it is still not ready yet. So to work around these issues here are some solutions:

You will get the spectral database results, and in insilico folder you will find metfrag parameter files (if you want to use metfrag CLI) and SIRIUS parameter files (that you can use to run SIRIUS or SIRIUS CLI).

once you have these results, ideally you can use MAW-Py to perform candidate selection. I have already prepared scripts for candidate selection with Spectral DBs and Metfrag and but due to some changes in SIRIUS results, that is not currently possible. However, I will work on it after the MAW-CWL is ready.

Let me know if this solves the problem. If the future library problem persists, can you send me an example file, and your systems RAM? I will try to recreate the error and solve it with Luiz.

Kind regards, Mahnoor

LiZhihua1982 commented 1 year ago

Dear Prof.Mahnoor, Thank you very much for your explanation! I have tried to fix this problem.

  1. Until now, I have changed the plan(list( tweak(multisession, workers = ((n.cores + 5) %/% 3) %/% 2), tweak(multisession, workers = ((n.cores + 5) %/% 3) %/% 2), tweak(multisession, workers = 3) )) in the Workflow_R_Script.r to plan(multisession, workers=8) to reduce (balance) the core and memory. I find it maybe works but needs a very long time such as several days (My data includes six files and each file is about 100MB), I started two days ago, but it is still not finished until now.
  2. I will try your suggestions when I go back to the office and tell you the results
  3. As you know, there are several very excellent software for processing MS data such as MS-DIAL, XCMS, and MZmine2(3), So I have a suggestion that would the MAW design to take the results of MS-DAIL,XCMS, and MZmine as inputs like GNPS?. This will facilitate and reduce the memory used for substream analysis.

I want to express thanks for your idea in MAW again and I really very like it.

Best regards

Li Zhihua


发件人: Mahnoor Zulfiqar @.> 发送时间: 2023年5月1日 20:41 收件人: zmahnoor14/MAW @.> 抄送: Li Zhihua @.>; Comment @.> 主题: Re: [zmahnoor14/MAW] Think I can't connect to Sirius because of the new 5.0 version (Issue #33)

Sorry for the delayed response here, I am currently trying to fix all these issues with CWL, but it is still not ready yet. So to work around these issues here are some solutions:

You will get the spectral database results, and in insilico folder you will find metfrag parameter files (if you want to use metfrag CLI) and SIRIUS parameter files (that you can use to run SIRIUS or SIRIUS CLI).

once you have these results, ideally you can use MAW-Py to perform candidate selection. I have already prepared scripts for candidate selection with Spectral DBs and Metfrag and but due to some changes in SIRIUS results, that is not currently possible. However, I will work on it after the MAW-CWL is ready.

Let me know if this solves the problem. If the future library problem persists, can you send me an example file, and your systems RAM? I will try to recreate the error and solve it with Luiz.

Kind regards, Mahnoor

― Reply to this email directly, view it on GitHubhttps://github.com/zmahnoor14/MAW/issues/33#issuecomment-1529674327, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJI2BJOWVEWDOQADQEAFPPTXD6VQBANCNFSM6AAAAAAV2JNORA. You are receiving this because you commented.Message ID: @.***>

LiZhihua1982 commented 1 year ago

Dear Prof.Mahnoor,

  1. Using the example Example_Tyrosine.mzML, it will works. but it will report the error when using my data (6 mzML and each mzML is about 100MB) , I will send my mzML data to you when I go back to office (May be 2 days later) 2.https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp [https://www.bing.com/th?id=OVP.gH_3vrOStbB5sgdNcWWMiwEsCo&pid=Api]https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp GNPS - Analyze, Connect, and Network with your Mass Spectrometry Datahttps://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp The Global Natural Product Social Molecular Networking (GNPS) site creates a community for natural product researchers working with mass spectrometry data. gnps.ucsd.edu

[cid:6a423f58-8d28-4c2e-9b76-206fb2a17ad1]

Thanks again!

Best regards

Li Zhihua


发件人: Mahnoor Zulfiqar @.> 发送时间: 2023年5月1日 20:41 收件人: zmahnoor14/MAW @.> 抄送: Li Zhihua @.>; Comment @.> 主题: Re: [zmahnoor14/MAW] Think I can't connect to Sirius because of the new 5.0 version (Issue #33)

Sorry for the delayed response here, I am currently trying to fix all these issues with CWL, but it is still not ready yet. So to work around these issues here are some solutions:

You will get the spectral database results, and in insilico folder you will find metfrag parameter files (if you want to use metfrag CLI) and SIRIUS parameter files (that you can use to run SIRIUS or SIRIUS CLI).

once you have these results, ideally you can use MAW-Py to perform candidate selection. I have already prepared scripts for candidate selection with Spectral DBs and Metfrag and but due to some changes in SIRIUS results, that is not currently possible. However, I will work on it after the MAW-CWL is ready.

Let me know if this solves the problem. If the future library problem persists, can you send me an example file, and your systems RAM? I will try to recreate the error and solve it with Luiz.

Kind regards, Mahnoor

― Reply to this email directly, view it on GitHubhttps://github.com/zmahnoor14/MAW/issues/33#issuecomment-1529674327, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJI2BJOWVEWDOQADQEAFPPTXD6VQBANCNFSM6AAAAAAV2JNORA. You are receiving this because you commented.Message ID: @.***>

LiZhihua1982 commented 1 year ago

Dear Prof.Mahnoor, 1.Just Now, the MAW is still works (May be 4-5days log), it works at the parse_data step as below. The computer is 64 RAM and Ubuntu 18.04.

  1. The mzML is about 100MB and it is difficult to submit it through the hotmail Email.
  2. Becuse it cannot access the zenodo in China, so I copy the GNPs.rda, hmdb.rda... from the maw-r:1.0.1, but I can not find the COCONUT database for MetFrag, So where I can find it in the maw-r:1.0.1[https://res-h3.public.cdn.office.net/assets/mail/file-icon/png/generic_16x16.png]QC21.mzMLhttps://1drv.ms/u/s!An_qG-a0TjfhgVO604WuokuiQU6g

    1. Workflow_R_Script_all.rhttps://github.com/zmahnoor14/MAW/blob/provenance/cwl/Workflow_R_Script_all.r
    2. All Spectral databases downloaded from Zenodohttps://zenodo.org/record/7519270
    3. your one input file .mzML
    4. COCONUT database for MetFraghttps://zenodo.org/record/7704937

Best regards

Li Zhihua

[cid:aaafae04-3cce-410a-98e4-65bb641e561f]


发件人: Mahnoor Zulfiqar @.> 发送时间: 2023年5月1日 20:41 收件人: zmahnoor14/MAW @.> 抄送: Li Zhihua @.>; Comment @.> 主题: Re: [zmahnoor14/MAW] Think I can't connect to Sirius because of the new 5.0 version (Issue #33)

Sorry for the delayed response here, I am currently trying to fix all these issues with CWL, but it is still not ready yet. So to work around these issues here are some solutions:

You will get the spectral database results, and in insilico folder you will find metfrag parameter files (if you want to use metfrag CLI) and SIRIUS parameter files (that you can use to run SIRIUS or SIRIUS CLI).

once you have these results, ideally you can use MAW-Py to perform candidate selection. I have already prepared scripts for candidate selection with Spectral DBs and Metfrag and but due to some changes in SIRIUS results, that is not currently possible. However, I will work on it after the MAW-CWL is ready.

Let me know if this solves the problem. If the future library problem persists, can you send me an example file, and your systems RAM? I will try to recreate the error and solve it with Luiz.

Kind regards, Mahnoor

― Reply to this email directly, view it on GitHubhttps://github.com/zmahnoor14/MAW/issues/33#issuecomment-1529674327, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJI2BJOWVEWDOQADQEAFPPTXD6VQBANCNFSM6AAAAAAV2JNORA. You are receiving this because you commented.Message ID: @.***>

LiZhihua1982 commented 1 year ago

Hi Mahnoor, After I pull the WAM:1.0.8, there is no Rscript Workflow_R_Script_all.r

Best regards

Li Zhihua

获取 Outlook for iOShttps://aka.ms/o0ukef


发件人: li zhihua @.> 发送时间: Monday, May 1, 2023 9:30:17 PM 收件人: zmahnoor14/MAW @.> 主题: 回复: [zmahnoor14/MAW] Think I can't connect to Sirius because of the new 5.0 version (Issue #33)

Dear Prof.Mahnoor,

  1. Using the example Example_Tyrosine.mzML, it will works. but it will report the error when using my data (6 mzML and each mzML is about 100MB) , I will send my mzML data to you when I go back to office (May be 2 days later) 2.https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp [https://www.bing.com/th?id=OVP.gH_3vrOStbB5sgdNcWWMiwEsCo&pid=Api]https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp GNPS - Analyze, Connect, and Network with your Mass Spectrometry Datahttps://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp The Global Natural Product Social Molecular Networking (GNPS) site creates a community for natural product researchers working with mass spectrometry data. gnps.ucsd.edu

[cid:6a423f58-8d28-4c2e-9b76-206fb2a17ad1]

Thanks again!

Best regards

Li Zhihua


发件人: Mahnoor Zulfiqar @.> 发送时间: 2023年5月1日 20:41 收件人: zmahnoor14/MAW @.> 抄送: Li Zhihua @.>; Comment @.> 主题: Re: [zmahnoor14/MAW] Think I can't connect to Sirius because of the new 5.0 version (Issue #33)

Sorry for the delayed response here, I am currently trying to fix all these issues with CWL, but it is still not ready yet. So to work around these issues here are some solutions:

You will get the spectral database results, and in insilico folder you will find metfrag parameter files (if you want to use metfrag CLI) and SIRIUS parameter files (that you can use to run SIRIUS or SIRIUS CLI).

once you have these results, ideally you can use MAW-Py to perform candidate selection. I have already prepared scripts for candidate selection with Spectral DBs and Metfrag and but due to some changes in SIRIUS results, that is not currently possible. However, I will work on it after the MAW-CWL is ready.

Let me know if this solves the problem. If the future library problem persists, can you send me an example file, and your systems RAM? I will try to recreate the error and solve it with Luiz.

Kind regards, Mahnoor

― Reply to this email directly, view it on GitHubhttps://github.com/zmahnoor14/MAW/issues/33#issuecomment-1529674327, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJI2BJOWVEWDOQADQEAFPPTXD6VQBANCNFSM6AAAAAAV2JNORA. You are receiving this because you commented.Message ID: @.***>

zmahnoor14 commented 1 year ago

Dear Li Zhihua,

sorry for my late response, I am still a PhD student and currently trying to finish my projects and have a lot of work on hands : )

Here is the link to the COCONUT installation: https://upload.uni-jena.de/data/6459b9bd932378.80487188/COCONUT_Jan2022.csv

I would also suggest to use the updated HMDB5.0 version and here is the link for the download: https://upload.uni-jena.de/data/6459ba4f540612.50374105/hmdb.rda

Regarding Workflow_R_Script_all.r, you can download the latest one on the GitHub repository with the link: https://github.com/zmahnoor14/MAW/blob/provenance/cwl/Workflow_R_Script_all.r