ml-struct-bio / cryodrgn

Neural networks for cryo-EM reconstruction
http://cryodrgn.cs.princeton.edu
GNU General Public License v3.0
313 stars 75 forks source link

High resolution analysis ValueError #406

Open Alana-Cowell opened 2 weeks ago

Alana-Cowell commented 2 weeks ago

Heya,

When I try to run cryodrgn analyze on my higher resolution dataset (256) I am getting the below error.

2024-10-02 12:10:55 Perfoming principal component analysis... Traceback (most recent call last): File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/bin/cryodrgn", line 33, in sys.exit(load_entry_point('cryodrgn==0.3.3', 'console_scripts', 'cryodrgn')()) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/cryodrgn-0.3.3-py3.7.egg/cryodrgn/main.py", line 54, in main args.func(args) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/cryodrgn-0.3.3-py3.7.egg/cryodrgn/commands/analyze.py", line 190, in main analyze_zN(z, outdir, vg, skip_umap=args.skip_umap, num_pcs=args.pc, num_ksamples=args.ksample) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/cryodrgn-0.3.3-py3.7.egg/cryodrgn/commands/analyze.py", line 64, in analyze_zN pc, pca = analysis.run_pca(z)
File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/cryodrgn-0.3.3-py3.7.egg/cryodrgn/analysis.py", line 34, in run_pca pca.fit(z) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/sklearn/decomposition/_pca.py", line 382, in fit self._fit(X) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/sklearn/decomposition/_pca.py", line 431, in _fit X, dtype=[np.float64, np.float32], ensure_2d=True, copy=self.copy File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/sklearn/base.py", line 561, in _validate_data X = check_array(X, **check_params) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 792, in check_array _assert_all_finite(array, allow_nan=force_all_finite == "allow-nan") File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 116, in _assert_all_finite type_err, msg_dtype if msg_dtype is not None else X.dtype ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

Command used is cryodrgn analyze 02_256_8D_1024 49 --Apix 1.08

I have over 500,000 particles so I'm wondering if that might be the issue? Or whether this might be related to the Assertion error people have been mentioning. Any suggestions on the cause and fix would be appreciated.

Thanks Alana

michal-g commented 2 weeks ago

Hi Alana, can you double-check the version of cryoDRGN you have installed (using the command cryodrgn --version) and also the version of Python you are using? It looks like you may have some older versions for each (v0.3.3 and v3.7).

Another thing to double-check is if there are indeed degenerate values in your latent space matrix, which you can look at using something like the following:

import numpy as np
from cryodrgn.utils impoirt load_pkl

z = load_pkl("02_256_8D_1024/z.49.pkl")
np.isnan(z).sum()

Generally the number of particles shouldn't be a problem in and of itself if the reconstruction already ran to completion, but the model might have had trouble coming up with a coherent representation of the heterogeneity landscape characterizing your input, leading to missing/null values in the model output.

Best, Michal

Alana-Cowell commented 2 days ago

Hi Michal,

Thank you for the information. We updated CryoDRGN to the latest version and the error has now changed a little.

File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/bin/cryodrgn", line 8, in sys.exit(main_commands()) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/cryodrgn/command_line.py", line 81, in main_commands _get_commands( File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/cryodrgn/command_line.py", line 76, in _get_commands args.func(args) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/cryodrgn/commands/analyze.py", line 485, in main analyze_zN( File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/cryodrgn/commands/analyze.py", line 119, in analyze_zN pc, pca = analysis.run_pca(z) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/cryodrgn/analysis.py", line 41, in run_pca pca.fit(z) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/sklearn/base.py", line 1473, in wrapper return fit_method(estimator, *args, kwargs) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 448, in fit self._fit(X) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 511, in _fit X = self._validate_data( File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/sklearn/base.py", line 633, in _validate_data out = check_array(X, input_name="X", check_params) File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/sklearn/utils/validation.py", line 1064, in check_array _assert_all_finite( File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/sklearn/utils/validation.py", line 123, in _assert_all_finite _assert_all_finite_element_wise( File "/s/ems/s/anaconda/v4.8.4/envs/cryodrgn_v3.4.1/lib/python3.10/site-packages/sklearn/utils/validation.py", line 172, in _assert_all_finite_element_wise raise ValueError(msg_err) ValueError: Input X contains NaN.

I'm not sure if this is something I can fix? Would it be worth re-running the training with the more updated version of cryoDRGN?