magland / ml_ms4alg

MountainSort v4
7 stars 19 forks source link

ms4alg processors randomly fail to run on cluster with node-js error #20

Open shashwatsridhar opened 5 years ago

shashwatsridhar commented 5 years ago

Hello,

I am spike sorting data sets on our local cluster (which uses SLURM) with mountainlab-js, making use of the different processors ms4alg.sort, ms4alg.create_label_map, ms4alg.apply_label_map. I run them as a part of a snakemake pipeline. Snakemake is a workflow management system which allows me to run large parameter scans easily. Each rule in a snakemake workflow is submitted as an individual job to the queuing system on the cluster, and hence works independently.

Of late, I have been seeing these errors randomly when running the processors from the ms4alg package.

(node:7572) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'original_checksum' of undefined
    at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:202:24
    at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:192:7
(node:7572) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:7572) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
(node:8167) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'original_checksum' of undefined
    at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:202:24
    at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:192:7
(node:8167) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:8167) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Since this is being run on the cluster, the corresponding output looks like this:

[ Getting processor spec... ]
[ Checking inputs and substituting prvs ... ]
[ Computing process signature ... ]
Process signature: cf7bd2ec46045c17c672c8bc4ddbf9075ea1d480
[ Checking outputs... ]
{"label_map_out":"/tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda"}
Processing ouput - /tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda
false
{"label_map_out":"/tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda"}
[ Checking process cache ... ]
[ Creating temporary directory ... ]
[ Creating links to input files... ]
[ Preparing temporary outputs... ]
Processing ouput - /tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda
false
[ Initializing process ... ]
[ Running ... ] /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/bin/python3 /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/etc/mountainlab/packages/ml_ms4alg/curation_spec.py.mp ms4alg.create_label_map --_tempdir=/tmp/mountainlab-tmp/tempdir_cf7bd2ec46_UfI18k --metrics=/tmp/mountainlab-tmp/tempdir_cf7bd2ec46_UfI18k/input_metrics_RMvnkPQS.json --label_map_out=/tmp/mountainlab-tmp/tempdir_cf7bd2ec46_UfI18k/output_label_map_out.mda --firing_rate_thresh=0.5 --isolation_thresh=0.92 --noise_overlap_thresh=0.2 --peak_snr_thresh=0.5
Elapsed time for processor ms4alg.create_label_map: 3.447 sec
Finalizing output label_map_out
[ Saving to process cache ... ]
[ Getting processor spec... ]
[ Checking inputs and substituting prvs ... ]
[ Computing process signature ... ]

The script that is originally run looks like this:

TEMPDIR=
        if [[ -z ${{TMPDIR+x}} ]]
        then
            TEMPDIR=/tmp
        else
            TEMPDIR=$TMPDIR
        fi

        TEMPDIR="$TEMPDIR/ml_create_label_map/{wildcards.dataset}_{wildcards.clip_size}_{wildcards.thr}_{wildcards.intvl}_{wildcards.fr}_{wildcards.iso}_{wildcards.noi}_{wildcards.snr}"
        mkdir -p $TEMPDIR/

        exitfunction() {{
            trap - TERM
            rm -r ${{TEMPDIR}}/
        }}

        trap "exitfunction" TERM

        cp {input.metrics} $TEMPDIR/ml_label_map_metrics.json
        cp {input.firings} $TEMPDIR/ml_label_map_firings.mda

        ml-run-process ms4alg.create_label_map \
            --inputs metrics:$TEMPDIR/ml_label_map_metrics.json \
            --outputs label_map_out:$TEMPDIR/ml_label_map_out.mda \
            --parameters firing_rate_thresh:{wildcards.fr} isolation_thresh:{wildcards.iso} noise_overlap_thresh:{wildcards.noi} peak_snr_thresh:{wildcards.snr}

        ml-run-process ms4alg.apply_label_map \
            --inputs firings:$TEMPDIR/ml_label_map_firings.mda label_map:$TEMPDIR/ml_label_map_out.mda \
            --outputs firings_out:$TEMPDIR/ml_label_map_curated_firings.mda \
            --parameters

        python {input.script} $TEMPDIR/ml_label_map_curated_firings.npy

        cp $TEMPDIR/ml_label_map_curated_firings.npy {output.npy}
        cp $TEMPDIR/ml_label_map_curated_firings.mda {output.mda}

        exitfunction

        trap - TERM

The slightly strange formatting is due to the wildcards system that snakemake follows. It fills in the wildcard entries automatically for different values that I request, and runs this script for each such such parameter set. This is an example of the create_label_map+apply_label_map step. A very similar script is also deployed for the sort step, and also yields the same error.

This error appears randomly, that is, if I run the script for the same configuration (same parameter set, for instance) again, it doesn't necessarily reappear. The conda environment that snakemake creates and uses has the following packages installed:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
alabaster                 0.7.12                     py_0    conda-forge
asn1crypto                0.24.0                py36_1003    conda-forge
attrs                     19.1.0                     py_0    conda-forge
babel                     2.7.0                      py_0    conda-forge
backcall                  0.1.0                      py_0    conda-forge
bleach                    3.1.0                      py_0    conda-forge
blosc                     1.16.3               he1b5a44_1    conda-forge
bzip2                     1.0.6             h14c3975_1002    conda-forge
ca-certificates           2019.6.16            hecc5488_0    conda-forge
certifi                   2019.6.16                py36_0    conda-forge
cffi                      1.12.3           py36h8022711_0    conda-forge
chardet                   3.0.4                 py36_1003    conda-forge
cryptography              2.7              py36h72c5cf5_0    conda-forge
cycler                    0.10.0                     py_1    conda-forge
dbus                      1.13.6               he372182_0    conda-forge
decorator                 4.4.0                      py_0    conda-forge
deepdish                  0.3.4                    py36_1    flatiron
defusedxml                0.5.0                      py_1    conda-forge
docopt                    0.6.2                      py_1    conda-forge
docutils                  0.14                  py36_1001    conda-forge
entrypoints               0.3                   py36_1000    conda-forge
expat                     2.2.5             he1b5a44_1003    conda-forge
fftw                      3.3.8           nompi_h7f3a6c3_1106    conda-forge
fontconfig                2.13.1            he4413a7_1000    conda-forge
freetype                  2.10.0               he983fc9_0    conda-forge
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
glib                      2.58.3            h6f030ca_1002    conda-forge
gst-plugins-base          1.14.5               h0935bb2_0    conda-forge
gstreamer                 1.14.5               h36ae1b5_0    conda-forge
h5py                      2.9.0           nompi_py36hf008753_1102    conda-forge
hdf5                      1.10.4          nompi_h3c11f04_1106    conda-forge
icu                       58.2              hf484d3e_1000    conda-forge
idna                      2.8                   py36_1000    conda-forge
imagesize                 1.1.0                      py_0    conda-forge
ipykernel                 5.1.1            py36h24bf2e0_0    conda-forge
ipython                   7.6.1            py36h5ca1d4c_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.5.0                      py_0    conda-forge
isosplit5                 0.1.3            py36h6bb024c_7    flatiron
jedi                      0.14.1                   py36_0    conda-forge
jinja2                    2.10.1                     py_0    conda-forge
joblib                    0.13.2                     py_0    conda-forge
jp-proxy-widget           1.0.0                    pypi_0    pypi
jpeg                      9c                h14c3975_1001    conda-forge
jsonschema                3.0.1                    py36_0    conda-forge
jupyter_client            5.3.1                      py_0    conda-forge
jupyter_core              4.4.0                      py_0    conda-forge
kiwisolver                1.1.0            py36hc9558a2_0    conda-forge
libblas                   3.8.0               10_openblas    conda-forge
libcblas                  3.8.0               10_openblas    conda-forge
libffi                    3.2.1             he1b5a44_1006    conda-forge
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libiconv                  1.15              h516909a_1005    conda-forge
liblapack                 3.8.0               10_openblas    conda-forge
libopenblas               0.3.6                h6e990d7_4    conda-forge
libpng                    1.6.37               hed695b0_0    conda-forge
libsodium                 1.0.17               h516909a_0    conda-forge
libstdcxx-ng              9.1.0                hdf63c60_0  
libuuid                   2.32.1            h14c3975_1000    conda-forge
libxcb                    1.13              h14c3975_1002    conda-forge
libxml2                   2.9.9                h13577e0_1    conda-forge
llvm-openmp               8.0.0                hc9558a2_0    conda-forge
lzo                       2.10              h14c3975_1000    conda-forge
markupsafe                1.1.1            py36h14c3975_0    conda-forge
matplotlib                3.1.1                    py36_0    conda-forge
matplotlib-base           3.1.1            py36hfd891ef_0    conda-forge
mistune                   0.8.4           py36h14c3975_1000    conda-forge
ml_ephys                  0.2.14                   py36_2    flatiron
ml_ms3                    0.2.4                h38c1b9e_1    flatiron
ml_ms4alg                 0.2.3            py36h6bb024c_0    flatiron
ml_pyms                   0.2.3            py36h6bb024c_1    flatiron
mock                      3.0.5                    py36_0    conda-forge
mountainlab               0.15.2                        0    flatiron
mountainlab-pytools       0.7.5                    pypi_0    pypi
mountainlab_pytools       0.7.5                    py36_2    flatiron
nbconvert                 5.5.0                      py_0    conda-forge
nbformat                  4.4.0                      py_1    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
neo                       0.7.1                    pypi_0    pypi
nixio                     1.5.0b2                  pypi_0    pypi
nodejs                    11.14.0              he1b5a44_1    conda-forge
notebook                  5.7.8                    py36_1    conda-forge
numexpr                   2.6.9           py36h637b7d7_1000    conda-forge
numpy                     1.16.4           py36h95a1406_0    conda-forge
numpydoc                  0.9.1                      py_0    conda-forge
openblas                  0.3.6                h6e990d7_4    conda-forge
openmp                    8.0.0                         0    conda-forge
openssl                   1.1.1c               h516909a_0    conda-forge
packaging                 19.0                       py_0    conda-forge
pandoc                    2.7.3                         0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parso                     0.5.0                      py_0    conda-forge
pbr                       5.4.0                    pypi_0    pypi
pcre                      8.41              hf484d3e_1003    conda-forge
pexpect                   4.7.0                    py36_0    conda-forge
pickleshare               0.7.5                 py36_1000    conda-forge
pip                       19.1.1                   py36_0    conda-forge
prometheus_client         0.7.1                      py_0    conda-forge
prompt_toolkit            2.0.9                      py_0    conda-forge
pthread-stubs             0.4               h14c3975_1001    conda-forge
ptyprocess                0.6.0                   py_1001    conda-forge
pycparser                 2.19                     py36_1    conda-forge
pygments                  2.4.2                      py_0    conda-forge
pyopenssl                 19.0.0                   py36_0    conda-forge
pyparsing                 2.4.0                      py_0    conda-forge
pyqt                      5.9.2            py36hcca6a23_0    conda-forge
pyrsistent                0.15.3           py36h516909a_0    conda-forge
pysocks                   1.7.0                    py36_0    conda-forge
pytables                  3.5.2            py36ha1aa75f_0    conda-forge
python                    3.6.7             h357f687_1005    conda-forge
python-dateutil           2.8.0                      py_0    conda-forge
pytz                      2019.1                     py_0    conda-forge
pyzmq                     18.0.2           py36hc4ba49a_1    conda-forge
qt                        5.9.7                h52cfd70_2    conda-forge
quantities                0.12.3                   pypi_0    pypi
readline                  8.0                  hf8c457e_0    conda-forge
requests                  2.22.0                   py36_1    conda-forge
scikit-learn              0.21.2           py36hcdab131_1    conda-forge
scipy                     1.3.0            py36h921218d_0    conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                41.0.1                   py36_0    conda-forge
sip                       4.19.8          py36hf484d3e_1000    conda-forge
six                       1.12.0                py36_1000    conda-forge
snowballstemmer           1.9.0                      py_0    conda-forge
sphinx                    2.1.2                      py_0    conda-forge
sphinxcontrib-applehelp   1.0.1                      py_0    conda-forge
sphinxcontrib-devhelp     1.0.1                      py_0    conda-forge
sphinxcontrib-htmlhelp    1.0.2                      py_0    conda-forge
sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
sphinxcontrib-qthelp      1.0.2                      py_0    conda-forge
sphinxcontrib-serializinghtml 1.1.1                      py_0    conda-forge
sqlite                    3.29.0               hcee41ef_0    conda-forge
stevedore                 1.30.1                   pypi_0    pypi
terminado                 0.8.2                    py36_0    conda-forge
testpath                  0.4.2                   py_1001    conda-forge
tk                        8.6.9             hed695b0_1002    conda-forge
tornado                   6.0.3            py36h516909a_0    conda-forge
traitlets                 4.3.2                 py36_1000    conda-forge
urllib3                   1.25.3                   py36_0    conda-forge
virtualenv                16.6.2                   pypi_0    pypi
virtualenv-clone          0.5.3                    pypi_0    pypi
virtualenvwrapper         4.8.4                    pypi_0    pypi
wcwidth                   0.1.7                      py_1    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.33.4                   py36_0    conda-forge
widgetsnbextension        3.5.0                    py36_0    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
xz                        5.2.4             h14c3975_1001    conda-forge
zeromq                    4.3.2                he1b5a44_2    conda-forge
zlib                      1.2.11            h516909a_1005    conda-forge

Any clue what might be happening? Anything else you need for debugging this?

Thanks!

PS I posted the same issue in the mountainlab-js repository, but I figured this was probably a better place to ask the question, sorry! If you think I close/remove the issue from mountainlab-js, please let me know!