ml-struct-bio / cryodrgn

Neural networks for cryo-EM reconstruction
http://cryodrgn.cs.princeton.edu
GNU General Public License v3.0
307 stars 76 forks source link

Error with analyze_convergence.py #172

Open olibclarke opened 1 year ago

olibclarke commented 1 year ago

Hi,

I ran analyze_convergence.py on one run of train_vae and it worked as expected.

When I ran it on a second run, it crashed with the appended output.

The only differences I can see between the run that worked and the one that failed are that in the one that worked, train_vae had run for 18 iterations with 128px preprocessed particles, while for the one that failed it had run for 31 iterations with 64px preprocessed particles. cryodrgn analyze run on the final iteration works fine, generates UMAP plots and cluster volumes etc.

(cryodrgn2) python /home/exx/software/cryodrgn/utils/analyze_convergence.py train_vae_full_64 31
/home/exx/software/cryodrgn/utils/analyze_convergence.py:16: DeprecationWarning: Please use `maximum_filter` from the `scipy.ndimage` namespace, the `scipy.ndimage.filters` namespace is deprecated.
  from scipy.ndimage.filters import maximum_filter, gaussian_filter
/home/exx/software/cryodrgn/utils/analyze_convergence.py:16: DeprecationWarning: Please use `gaussian_filter` from the `scipy.ndimage` namespace, the `scipy.ndimage.filters` namespace is deprecated.
  from scipy.ndimage.filters import maximum_filter, gaussian_filter
/home/exx/software/cryodrgn/utils/analyze_convergence.py:17: DeprecationWarning: Please use `distance_transform_edt` from the `scipy.ndimage` namespace, the `scipy.ndimage.morphology` namespace is deprecated.
  from scipy.ndimage.morphology import distance_transform_edt, binary_dilation
/home/exx/software/cryodrgn/utils/analyze_convergence.py:17: DeprecationWarning: Please use `binary_dilation` from the `scipy.ndimage` namespace, the `scipy.ndimage.morphology` namespace is deprecated.
  from scipy.ndimage.morphology import distance_transform_edt, binary_dilation
2022-10-27 07:27:12     Namespace(workdir='/data/processing/cryodrgn_test/train_vae_full_64', epoch=31, outdir=None, epoch_interval=5, force_umap_cpu=False, subset=50000, random_seed=None, random_state=42, n_epochs_umap=25000, skip_umap=False, n_bins=30, smooth=True, smooth_width=1.0, pruned_maxima=12, radius=5.0, final_maxima=10, Apix=1.0, flip=False, invert=False, downsample=None, cuda=None, skip_volgen=False, max_threads=8, thresh=None, dilate=3, dist=10)
2022-10-27 07:27:12     Saving all results to /data/processing/cryodrgn_test/train_vae_full_64/convergence.31
2022-10-27 07:27:12     Convergence 1: plotting total loss curve ...
Traceback (most recent call last):
  File "/home/exx/software/cryodrgn/utils/analyze_convergence.py", line 840, in <module>
    main(parser.parse_args())
  File "/home/exx/software/cryodrgn/utils/analyze_convergence.py", line 767, in main
    plot_loss(logfile, outdir, E, LOG)
  File "/home/exx/software/cryodrgn/utils/analyze_convergence.py", line 79, in plot_loss
    loss = analysis.parse_loss(logfile)
  File "/usr/local/envs/cryodrgn2/lib/python3.9/site-packages/cryodrgn/analysis.py", line 24, in parse_loss
    loss = [re.search(regex, x).group(1) for x in lines]
  File "/usr/local/envs/cryodrgn2/lib/python3.9/site-packages/cryodrgn/analysis.py", line 24, in <listcomp>
    loss = [re.search(regex, x).group(1) for x in lines]
AttributeError: 'NoneType' object has no attribute 'group'
(cryodrgn2)
olibclarke commented 1 year ago

Looking at the run.log it looks like something went a little pear-shaped after ~it18 - even though the UMAP plots & volumes look fine, the KLD & loss values look not quite right. Possibly related? These particles are preprocessed with binning to Apix=5.84, is that too much downsampling?

EDIT: Not sure if this is the reason, as analyze_convergence.py still fails if I specify iteration 17...

image
olibclarke commented 1 year ago

(this was definitely caused by the nans and infs. A second round of train_vae, after excluding a small (0.2%!) population of outlier particles, had no such stability issues, and analyze_convergence worked fine)