ml-struct-bio / drgnai

GNU General Public License v3.0
24 stars 3 forks source link

drgnai train fails after initial epoch #9

Open bdcook opened 2 months ago

bdcook commented 2 months ago

Hello,

im trying to run a drgnai train on a particle stack from cryoSPARC. I am unfortunately running into this error:

> `(INFO) (reconstruct.py) (11-Sep-24 11:31:59) Use cuda False
> (INFO) (reconstruct.py) (11-Sep-24 11:31:59) Will write tensorboard summaries in dir/out/summaries
> (INFO) (reconstruct.py) (11-Sep-24 11:31:59) Creating dataset
> (INFO) (dataset.py) (11-Sep-24 11:32:22) Lazy loaded 183500 132x132 images
> (INFO) (dataset.py) (11-Sep-24 11:32:22) Windowing images with radius 0.7
> (INFO) (dataset.py) (11-Sep-24 11:32:22) Spawning 16 processes
> /home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/numpy/core/_methods.py:246: RuntimeWarning: overflow encountered in reduce
>   ret = umr_sum(x, axis, dtype, out, keepdims=keepdims, where=where)
> (INFO) (reconstruct.py) (11-Sep-24 11:32:24) Loading ctf params from /home/bcook/dir/Analysis92024/Job1CTF2.pkl
> (INFO) (ctf.py) (11-Sep-24 11:32:24) Image size (pix)  : 132
> (INFO) (ctf.py) (11-Sep-24 11:32:24) A/pix             : 1.9600000381469727
> (INFO) (ctf.py) (11-Sep-24 11:32:24) DefocusU (A)      : 37246.48046875
> (INFO) (ctf.py) (11-Sep-24 11:32:24) DefocusV (A)      : 35244.90234375
> (INFO) (ctf.py) (11-Sep-24 11:32:24) Dfang (deg)       : 63.41419982910156
> (INFO) (ctf.py) (11-Sep-24 11:32:24) voltage (kV)      : 300.0
> (INFO) (ctf.py) (11-Sep-24 11:32:24) cs (mm)           : 2.700000047683716
> (INFO) (ctf.py) (11-Sep-24 11:32:24) w                 : 0.10000000149011612
> (INFO) (ctf.py) (11-Sep-24 11:32:24) Phase shift (deg) : 0.0
> (INFO) (reconstruct.py) (11-Sep-24 11:32:24) Building lattice
> (INFO) (reconstruct.py) (11-Sep-24 11:32:24) Heterogeneous reconstruction with z_dim = 4
> (INFO) (reconstruct.py) (11-Sep-24 11:32:24) Initializing model...
> (INFO) (reconstruct.py) (11-Sep-24 11:32:24) DrgnAI(
>   (pose_table): PoseTable()
>   (conf_table): ConfTable()
>   (hypervolume): HyperVolume(
>     (mlp): ResidualLinearMLP(
>       (main): Sequential(
>         (0): Linear(in_features=388, out_features=256, bias=True)
>         (1): ReLU()
>         (2): ResidualLinear(
>           (linear): Linear(in_features=256, out_features=256, bias=True)
>         )
>         (3): ReLU()
>         (4): ResidualLinear(
>           (linear): Linear(in_features=256, out_features=256, bias=True)
>         )
>         (5): ReLU()
>         (6): ResidualLinear(
>           (linear): Linear(in_features=256, out_features=256, bias=True)
>         )
>         (7): ReLU()
>         (8): MyLinear(in_features=256, out_features=1, bias=True)
>       )
>     )
>   )
> )
> (INFO) (reconstruct.py) (11-Sep-24 11:32:24) 2499217 parameters in model
> (INFO) (reconstruct.py) (11-Sep-24 11:32:24) Model initialized. Moving to GPU...
> (INFO) (reconstruct.py) (11-Sep-24 11:32:25) --- Training Starts Now ---
> (INFO) (reconstruct.py) (11-Sep-24 11:32:25) Will pretrain on 10000 particles
> (INFO) (reconstruct.py) (11-Sep-24 11:32:25) Will make a full summary at the end of this epoch
> /home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/cryodrgnai/dataset.py:227: RuntimeWarning: invalid value encountered in multiply
>   particle_real *= window_mask(particle_real.shape[-1], self.window_r, .99)
> /home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/cryodrgnai/dataset.py:227: RuntimeWarning: invalid value encountered in multiply
>   particle_real *= window_mask(particle_real.shape[-1], self.window_r, .99)
> /home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/cryodrgnai/dataset.py:240: RuntimeWarning: invalid value encountered in true_divide
>   particle_real = (particle_real - self.norm_real[0]) / self.norm_real[1]
> /home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/cryodrgnai/dataset.py:240: RuntimeWarning: invalid value encountered in true_divide
>   particle_real = (particle_real - self.norm_real[0]) / self.norm_real[1]
> (INFO) (reconstruct.py) (11-Sep-24 11:46:13) # [Train Epoch: -1/102] [10048/183500 particles]
> (INFO) (reconstruct.py) (11-Sep-24 11:46:13) # =====> SGD Epoch: -1 finished in 0:13:47.514218; total loss = nan
> Traceback (most recent call last):
>   File "/home/bcook/.conda/envs/drgnai/bin/drgnai", line 8, in <module>
>     sys.exit(run_cryodrgn_ai())
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/cryodrgnai/command_line.py", line 168, in run_cryodrgn_ai
>     args.func(args)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/cryodrgnai/command_line.py", line 322, in train_experiment
>     trainer.train()
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/cryodrgnai/reconstruct.py", line 770, in train
>     self.make_heavy_summary()
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/cryodrgnai/reconstruct.py", line 1096, in make_heavy_summary
>     summary.make_img_summary(self.writer, self.in_dict_last,
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/cryodrgnai/summary.py", line 148, in make_img_summary
>     plt.colorbar()
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/pyplot.py", line 2073, in colorbar
>     ret = gcf().colorbar(mappable, cax=cax, ax=ax, **kwargs)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/figure.py", line 1281, in colorbar
>     cb = cbar.Colorbar(cax, mappable, **cb_kw)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/_api/deprecation.py", line 384, in wrapper
>     return func(*inner_args, **inner_kwargs)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/colorbar.py", line 380, in __init__
>     self._reset_locator_formatter_scale()
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/colorbar.py", line 1165, in _reset_locator_formatter_scale
>     self._process_values()
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/colorbar.py", line 1099, in _process_values
>     self.norm.vmin, self.norm.vmax = mtransforms.nonsingular(
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/colors.py", line 1249, in vmin
>     self._changed()
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/colors.py", line 1277, in _changed
>     self.callbacks.process('changed')
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/cbook/__init__.py", line 312, in process
>     self.exception_handler(exc)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/cbook/__init__.py", line 96, in _exception_printer
>     raise exc
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/cbook/__init__.py", line 307, in process
>     func(*args, **kwargs)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/image.py", line 326, in changed
>     cm.ScalarMappable.changed(self)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/cm.py", line 683, in changed
>     self.callbacks.process('changed', self)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/cbook/__init__.py", line 312, in process
>     self.exception_handler(exc)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/cbook/__init__.py", line 96, in _exception_printer
>     raise exc
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/cbook/__init__.py", line 307, in process
>     func(*args, **kwargs)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/colorbar.py", line 495, in update_normal
>     self._draw_all()
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/colorbar.py", line 530, in _draw_all
>     self._process_values()
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/colorbar.py", line 1103, in _process_values
>     b = self.norm.inverse(b)
>   File "/home/bcook/.conda/envs/drgnai/lib/python3.9/site-packages/matplotlib/colors.py", line 1707, in inverse
>     raise ValueError("Invalid vmin or vmax")
> ValueError: Invalid vmin or vmax`

This is my config file:

>  particles: /home/bcook/dir/Analysis92024/Job1_RefinDrgn.star
> datadir: /home/bcook/dir/Analysis92024
> ctf: /home/bcook/dir/Analysis92024/Job1CTF2.pkl
> pose: null
> window_radius_gt_real: 0.7
> lazy: true
> quick_config:
>   capture_setup: spa
>   reconstruction_type: het
>   pose_estimation: abinit
>   conf_estimation: autodecoder

It feels like my particle stack may be the issue but so far trying different particles files (star vs txt file pointing to the mrc files) hasnt helped.

Thanks!