ml-struct-bio / drgnai

GNU General Public License v3.0
22 stars 2 forks source link

Tcl_AsyncDelete: async handler deleted by the wrong thread #4

Closed papaig closed 2 months ago

papaig commented 3 months ago

Hi,

I try to run DRGN-AI on a set of 149243 particle images, but it fails with the message: Tcl_AsyncDelete: async handler deleted by the wrong thread. I mage the pkl files with cryoDRGN. I would greatly appreciate your help! Please find the details bellow. My config.yaml: particles: /mnt/storage/teams/some_complex/cryosparc_projects/P27/J79_particles_128.mrcs ctf: /mnt/storage/teams/some_complex/cryosparc_projects/P27/J79_ctf.pkl pose: null quick_config: capture_setup: spa reconstruction_type: het pose_estimation: abinit conf_estimation: autodecoder

The log: (INFO) (reconstruct.py) (27-Jun-24 14:35:42) Number of available gpus: 2 (INFO) (reconstruct.py) (27-Jun-24 14:35:43) Use cuda True (INFO) (reconstruct.py) (27-Jun-24 14:35:43) Will write tensorboard summaries in drgnai/J79/out/summaries (INFO) (reconstruct.py) (27-Jun-24 14:35:43) Creating dataset (INFO) (dataset.py) (27-Jun-24 14:36:21) Loaded 149243 128x128 images (INFO) (dataset.py) (27-Jun-24 14:36:21) Windowing images with radius 0.85 (INFO) (dataset.py) (27-Jun-24 14:36:22) Computing FFT (INFO) (dataset.py) (27-Jun-24 14:36:22) Spawning 16 processes (INFO) (dataset.py) (27-Jun-24 14:37:37) Symmetrizing image data (INFO) (dataset.py) (27-Jun-24 14:38:46) Normalized HT by 0 +/- 419.2730407714844 (INFO) (dataset.py) (27-Jun-24 14:40:05) Normalized real space images by 0.10456375777721405 +/- 3.27588152885437 (INFO) (reconstruct.py) (27-Jun-24 14:40:14) Loading ctf params from /mnt/storage/teams/some_complex/cryosparc_projects/P27/J79_ctf.pkl (INFO) (ctf.py) (27-Jun-24 14:40:14) Image size (pix) : 128 (INFO) (ctf.py) (27-Jun-24 14:40:14) A/pix : 2.733750104904175 (INFO) (ctf.py) (27-Jun-24 14:40:14) DefocusU (A) : 29496.822265625 (INFO) (ctf.py) (27-Jun-24 14:40:14) DefocusV (A) : 29020.39453125 (INFO) (ctf.py) (27-Jun-24 14:40:14) Dfang (deg) : 22.268369674682617 (INFO) (ctf.py) (27-Jun-24 14:40:14) voltage (kV) : 300.0 (INFO) (ctf.py) (27-Jun-24 14:40:14) cs (mm) : 2.700000047683716 (INFO) (ctf.py) (27-Jun-24 14:40:14) w : 0.10000000149011612 (INFO) (ctf.py) (27-Jun-24 14:40:14) Phase shift (deg) : 0.0 (INFO) (reconstruct.py) (27-Jun-24 14:40:14) Building lattice (INFO) (reconstruct.py) (27-Jun-24 14:40:15) Heterogeneous reconstruction with z_dim = 4 (INFO) (reconstruct.py) (27-Jun-24 14:40:15) Initializing model... (INFO) (reconstruct.py) (27-Jun-24 14:40:15) DrgnAI( (pose_table): PoseTable() (conf_table): ConfTable() (hypervolume): HyperVolume( (mlp): ResidualLinearMLP( (main): Sequential( (0): Linear(in_features=388, out_features=256, bias=True) (1): ReLU() (2): ResidualLinear( (linear): Linear(in_features=256, out_features=256, bias=True) ) (3): ReLU() (4): ResidualLinear( (linear): Linear(in_features=256, out_features=256, bias=True) ) (5): ReLU() (6): ResidualLinear( (linear): Linear(in_features=256, out_features=256, bias=True) ) (7): ReLU() (8): MyLinear(in_features=256, out_features=1, bias=True) ) ) ) ) (INFO) (reconstruct.py) (27-Jun-24 14:40:15) 2088133 parameters in model (INFO) (reconstruct.py) (27-Jun-24 14:40:15) Model initialized. Moving to GPU... (INFO) (reconstruct.py) (27-Jun-24 14:40:15) --- Training Starts Now --- (INFO) (reconstruct.py) (27-Jun-24 14:40:15) Will pretrain on 10000 particles (INFO) (reconstruct.py) (27-Jun-24 14:40:15) Will make a full summary at the end of this epoch (INFO) (reconstruct.py) (27-Jun-24 14:41:28) # [Train Epoch: -1/103] [10048/149243 particles] (INFO) (reconstruct.py) (27-Jun-24 14:41:30) # =====> SGD Epoch: -1 finished in 0:01:14.951658; total loss = 1.032471 (INFO) (analysis.py) (27-Jun-24 14:41:32) Explained variance ratio: (INFO) (analysis.py) (27-Jun-24 14:41:32) [0.26207975 0.24745756 0.24616494 0.24429776] (INFO) (reconstruct.py) (27-Jun-24 14:41:32) Will use pose search on 149243 particles (INFO) (reconstruct.py) (27-Jun-24 14:41:32) Will make a full summary at the end of this epoch Exception ignored in: <function Image.del at 0x7f5e1bfc4ee0> Traceback (most recent call last): File "/software/miniconda3/envs/drgnai-env/lib/python3.9/tkinter/init.py", line 4017, in del self.tk.call('image', 'delete', self.name) RuntimeError: main thread is not in main loop Exception ignored in: <function Variable.del at 0x7f5e1bfaa5e0> Traceback (most recent call last): File "/software/miniconda3/envs/drgnai-env/lib/python3.9/tkinter/init.py", line 363, in del if self._tk.getboolean(self._tk.call("info", "exists", self._name)): RuntimeError: main thread is not in main loop Exception ignored in: <function Variable.del at 0x7f5e1bfaa5e0> Traceback (most recent call last): File "/software/miniconda3/envs/drgnai-env/lib/python3.9/tkinter/init.py", line 363, in del if self._tk.getboolean(self._tk.call("info", "exists", self._name)): RuntimeError: main thread is not in main loop Exception ignored in: <function Variable.del at 0x7f5e1bfaa5e0> Traceback (most recent call last): File "/software/miniconda3/envs/drgnai-env/lib/python3.9/tkinter/init.py", line 363, in del if self._tk.getboolean(self._tk.call("info", "exists", self._name)): RuntimeError: main thread is not in main loop Exception ignored in: <function Variable.del at 0x7f5e1bfaa5e0> Traceback (most recent call last): File "/software/miniconda3/envs/drgnai-env/lib/python3.9/tkinter/init.py", line 363, in del if self._tk.getboolean(self._tk.call("info", "exists", self._name)): RuntimeError: main thread is not in main loop Tcl_AsyncDelete: async handler deleted by the wrong thread Aborted

Thank you in advance for your help!

Best, Gabor

WenZhng commented 2 months ago

I met the same problem. After 3 epochs it turns out ''RuntimeError: main thread is not in main loop''. The python version is 3.10 and drgnai seems to not support 3.11 and above.

(INFO) (reconstruct.py) (07-Jul-24 21:25:09) # =====> SGD Epoch: 3 finished in 0:01:09.041667; total loss = 1.056778 (INFO) (reconstruct.py) (07-Jul-24 21:25:20) Will use pose search on 3562 particles (INFO) (reconstruct.py) (07-Jul-24 21:25:20) Will make a full summary at the end of this epoch Exception ignored in: <function Image.__del__ at 0x7f28cf4aaa60> Traceback (most recent call last): File "/spshared/apps/miniconda3/envs/drgnai/lib/python3.9/tkinter/__init__.py", line 4017, in __del__ self.tk.call('image', 'delete', self.name) RuntimeError: main thread is not in main loop Exception ignored in: <function Variable.__del__ at 0x7f28cf497160> Traceback (most recent call last): File "/spshared/apps/miniconda3/envs/drgnai/lib/python3.9/tkinter/__init__.py", line 363, in __del__ if self._tk.getboolean(self._tk.call("info", "exists", self._name)): RuntimeError: main thread is not in main loop Exception ignored in: <function Variable.__del__ at 0x7f28cf497160> Traceback (most recent call last): File "/spshared/apps/miniconda3/envs/drgnai/lib/python3.9/tkinter/__init__.py", line 363, in __del__ if self._tk.getboolean(self._tk.call("info", "exists", self._name)): RuntimeError: main thread is not in main loop Exception ignored in: <function Variable.__del__ at 0x7f28cf497160> Traceback (most recent call last): File "/spshared/apps/miniconda3/envs/drgnai/lib/python3.9/tkinter/__init__.py", line 363, in __del__ if self._tk.getboolean(self._tk.call("info", "exists", self._name)): RuntimeError: main thread is not in main loop Exception ignored in: <function Variable.__del__ at 0x7f28cf497160> Traceback (most recent call last): File "/spshared/apps/miniconda3/envs/drgnai/lib/python3.9/tkinter/__init__.py", line 363, in __del__ if self._tk.getboolean(self._tk.call("info", "exists", self._name)): RuntimeError: main thread is not in main loop Tcl_AsyncDelete: async handler deleted by the wrong thread Aborted (core dumped)

michal-g commented 2 months ago

Closing this as it seems we are dealing with the same underlying problem as in #2 — will discuss more in that thread!