rpeckner-broad / Specter

Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics
17 stars 12 forks source link

fails with "Shutdown hook called" #4

Closed vnaum closed 6 years ago

vnaum commented 6 years ago

Trying to get it working, but spark-submit command fails with this output:

Library loaded in 1.7 minutes
18/04/13 12:20:59 INFO ShutdownHookManager: Shutdown hook called
18/04/13 12:20:59 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-607dcee6-a287-4c4a-9d4f-a289abe235bf

Specter.sh then ignores errors (set -e would help) and gets to call R, but surely it fails (theres no input file).

We're running Python 2.7.11, R version 3.4.1 (2017-06-30) -- "Single Candle", conda 4.4.10 and Spark 2.3.0 (Amazon's Elastic Map Reduce in pretty much default configuration)

maybe I missed something in installation guide, or my data file is broken? I had to patch Specter_Spark.py to handle 'scan start time' instead of 'scan time' it expects (for some reason mzML I got has 'scan start time') -- just search-and-replace, otherwise it fails much sooner with

  File "/root/Specter/Specter_Spark.py", line 350, in <module>
    res = [[spectrum.peaks,spectrum['precursors'][0]['mz'],spectrum['scan time'],i] for i,spectrum in E if spectrum['ms level'] == 2.0 and i < Index]
KeyError: 'scan time'

Maybe there are means to run the code w/o cluster wrappers to rule out misconfigured Spark?

rpeckner-broad commented 6 years ago

Hi Vladislav, This looks like it's most likely an issue with either the mzML or pymzml version you're using rather than Spark; otherwise you would have seen the message Loaded {} MS2 spectra from {} in {} minutes before Spark was invoked. Instead it seems that Spark is being passed an empty set of spectra. Could you let me know the version of pymzml you're using as well as the version of msconvert used to make the mzml, and the exact command you used at the terminal to run Specter? Thanks for pointing out the discrepancy with the 'scan time' key - I've updated this to the more universal accession key 'MS:1000016' which should work with all mzMLs.

vnaum commented 6 years ago

please disregard. for some reason I read "At least 100 GB of cluster RAM is recommended." as "whole cluster should have 100Gb", not "each node should have 100Gb". And since OOM happened deep in Spark, theres zero indication on error except lines in /var/log/messages. I spawned single-node cluster with 122Gb ram and it... sort of works. At least, it passes through this bit. Fails afterwards, but theres log to work with. I'll create new issue once I collect enough data (or a pull request -- if there's any workaround/fix).

Thanks for replying!

you would have seen the message Loaded {} MS2 spectra from {} in {} minutes

It failed earlier than this :-)

the version of pymzml you're using

pymzml-0.7.8-py27_0, straingt from conda. I did copy that OBO file mentioned in PDF to where promram says it should be (/root/miniconda2/envs/SpecterEnv/lib/python2.7/site-packages/pymzml/obo/psi-ms-4.0.14.obo)

version of msconvert used to make the mzml

We used ProteWizard 3.0.11676. Also we are using a raw file and blib file from their PRIDE repo for the associated publication.

exact command you used at the terminal to run Specter

./Specter.sh 20g /mnt/CS20170831_SV_HEK_SpikeP100_108ng_Overlap22_01 /mnt/HEKAndP100HeavyLib 100000 end 200 orbitrap 10