magland / mountainlab

spike sorting software
25 stars 22 forks source link

Seg fault when computing cluster isolation metrics #190

Closed peterzh closed 6 years ago

peterzh commented 6 years ago

Hi, I'm running ms3 on a large dataset (385 channels, 60GB raw data file). I'm using geom.csv as recommended. When running the sort however I get the following error:

Process finished: mountainsort.mountainsort3

SCRIPT: Process already completed: mountainsort.compute_templates

SCRIPT: Process already completed: mountainsort.cluster_metrics
Launching process mountainsort.isolation_metrics

/home/carandini/mountainlab/cpp/mountainprocess/bin/mountainprocess run-process mountainsort.isolation_metrics --compute_bursting_parents=true --firings=/tmp/mountainlab/tmp_long_term/2e91389d019f2b6332f1327d5e1e201f8ffb9d13-mountainsort.mountainsort3-firings_out.tmp --metrics_out=/tmp/mountainlab/tmp_long_term/37f88c84cccdd145601ed272d7058fcfca701887-mountainsort.isolation_metrics-metrics_out.tmp --pair_metrics_out=/tmp/mountainlab/tmp_long_term/8b69e0c68029976578305f4a6c10594efa986d53-mountainsort.isolation_metrics-pair_metrics_out.tmp --timeseries=/tmp/mountainlab/tmp_long_term/33bd9e634f504fecf7d5f1f7c41dc28c95f5d787-mountainsort.whiten-timeseries_out.tmp --_working_path=/home/carandini/Desktop/sortingProject --_process_output=/tmp/mountainlab/tmp_short_term/ms.1415343874.140283555813184.1337360875.tmp.process_output

COPYING FILE: /tmp/mountainlab/tmp_long_term/2e91389d019f2b6332f1327d5e1e201f8ffb9d13-mountainsort.mountainsort3-firings_out.tmp -> /home/carandini/Desktop/sortingProject/output/ms3--proj1/firings.mda
mountainsort.isolation_metrics:: STARTING: /home/carandini/mountainlab/packages/mountainsort2/bin/mountainsort2.mp mountainsort.isolation_metrics --firings=/tmp/mountainlab/tmp_long_term/2e91389d019f2b6332f1327d5e1e201f8ffb9d13-mountainsort.mountainsort3-firings_out.tmp --timeseries=/tmp/mountainlab/tmp_long_term/33bd9e634f504fecf7d5f1f7c41dc28c95f5d787-mountainsort.whiten-timeseries_out.tmp --metrics_out=/tmp/mountainlab/tmp_long_term/37f88c84cccdd145601ed272d7058fcfca701887-mountainsort.isolation_metrics-metrics_out.tmp --pair_metrics_out=/tmp/mountainlab/tmp_long_term/8b69e0c68029976578305f4a6c10594efa986d53-mountainsort.isolation_metrics-pair_metrics_out.tmp --compute_bursting_parents=true --_tempdir=/tmp/mountainlab/tmp_short_term/tempdir_QQ9LZPGMNV .
Starting p_isolation_metrics

mountainsort.isolation_metrics:: Computing cluster metrics...

mountainsort.isolation_metrics:: "Unable to allocate Mda of size 385x50x87270x1x1x1 (total=1679947500)"

mountainsort.isolation_metrics:: /tmp/mountainlab/tmp_short_term/ms.1415343902.140515833972544.1942841066.tmp: line 13: 11292 Segmentation fault      (core dumped) /home/carandini/mountainlab/packages/mountainsort2/bin/mountainsort2.mp mountainsort.isolation_metrics --firings=/tmp/mountainlab/tmp_long_term/2e91389d019f2b6332f1327d5e1e201f8ffb9d13-mountainsort.mountainsort3-firings_out.tmp --timeseries=/tmp/mountainlab/tmp_long_term/33bd9e634f504fecf7d5f1f7c41dc28c95f5d787-mountainsort.whiten-timeseries_out.tmp --metrics_out=/tmp/mountainlab/tmp_long_term/37f88c84cccdd145601ed272d7058fcfca701887-mountainsort.isolation_metrics-metrics_out.tmp --pair_metrics_out=/tmp/mountainlab/tmp_long_term/8b69e0c68029976578305f4a6c10594efa986d53-mountainsort.isolation_metrics-pair_metrics_out.tmp --compute_bursting_parents=true --_tempdir=/tmp/mountainlab/tmp_short_term/tempdir_QQ9LZPGMNV

mountainsort.isolation_metrics:: ---------------------------------------------------------------
PROCESS COMPLETED (exit code = 139): mountainsort.isolation_metrics
ERROR: Exit code is non-zero: mountainsort.isolation_metrics
Peak RAM: 3 MB. Peak CPU: 0%. Avg CPU: 0%. Elapsed time: 1357.99 seconds.
---------------------------------------------------------------

mountainsort.isolation_metrics:: 2017-10-25:17-50-23-039    mp.main critical    Error in mountainprocessmain "Exit code is non-zero: mountainsort.isolation_metrics"

Process finished: mountainsort.isolation_metrics

2017-10-25:17-50-23-692 mp.script_controller    warning "Process returned with non-zero exit code: mountainsort.isolation_metrics"

Peak Memory (MB):
  0 (mountainsort.mountainsort3)

Peak CPU percent:
  0 (mountainsort.mountainsort3)

Avg CPU (pct):
  0 (mountainsort.mountainsort3)

Elapsed time (sec):

  0 (mountainsort.mountainsort3)

Any advice on how to proceed here? I'm running on a computer with 47GB RAM, and Intel® Xeon(R) CPU E5640 @ 2.67GHz × 16 .

Thank you, Peter

wysota commented 6 years ago

I don't think you have reached RAM limits of your machine.

total=1,679,947,500 = 1,6G cells, each of size 4 or 8 gives around 13GB of RAM max.

magland commented 6 years ago

Looking at the code it does appear to be a failure of malloc for such a large array.

To be honest, I never computed the isolation metrics for that large number of channels. We need to modify the code to operate on neighborhoods. Not sure when I can get to that...

wysota commented 6 years ago

The system user might also have system resource limits set or memory can be occupied by some other process.