tbepler / topaz

Pipeline for particle picking in cryo-electron microscopy images using convolutional neural networks trained from positive and unlabeled examples. Also featuring micrograph and tomogram denoising with DNNs.
GNU General Public License v3.0
170 stars 62 forks source link

Topaz Train/Cross-Validation failed with AssertionError: Subprocess exited with status -9. #160

Closed PeterXTH closed 3 months ago

PeterXTH commented 1 year ago

Here is the error message. Topaz Cross Validation: Traceback (most recent call last): File “cryosparc_worker/cryosparc_compute/run.py”, line 93, in cryosparc_compute.run.main File “/data/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py”, line 794, in run_topaz_wrapper_cross_validation assert len(tables) > 0, “All subsidiary training jobs failed or were killed.” AssertionError: All subsidiary training jobs failed or were killed. Topaz Train: Traceback (most recent call last): File “cryosparc_worker/cryosparc_compute/run.py”, line 93, in cryosparc_compute.run.main File “/data/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py”, line 360, in run_topaz_wrapper_train utils.run_process(train_command) File “/data/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py”, line 98, in run_process assert process.returncode == 0, f"Subprocess exited with status {process.returncode} ({str_command})" AssertionError: Subprocess exited with status 1 (/primary/vari/software/topaz/default-cryosparc/topaz train --num-particles 200 --k-fold 2 --fold 0 --learning-rate 0.0002 --minibatch-size 128 --num-epochs 10 --method GE-binomial --slack -1 --autoencoder 0 --l2 0.0 --minibatch-balance 0.0625 --epoch-size …)

In the full log, it shows: MemoryError: Unable to allocate 144. GiB for an array with shape (19273333732,) and data type [('image', '<u4'), ('coord', '<u4')] Does anyone know how to solve this problem? Thank you so much. Tinghai

PeterXTH commented 3 months ago

The problem was solved by reducing the micrograph numbers.