ml-struct-bio / cryodrgn

Neural networks for cryo-EM reconstruction
http://cryodrgn.cs.princeton.edu
GNU General Public License v3.0
307 stars 76 forks source link

Support idealized imaging settings / toy datasets? #253

Open zhonge opened 1 year ago

zhonge commented 1 year ago
          > I've tested the notebooks too and they're working correctly w.r.t the rest of the `config` changes introduced in this PR. However, the notebooks are failing both in the `master` branch as well as this branch on an unrelated error, on the line below:
# Load poses
if config['dataset_args']['do_pose_sgd']:
    pose_pkl = f'{WORKDIR}/pose.{EPOCH}.pkl'
    with open(pose_pkl,'rb') as f:
        rot, trans = pickle.load(f)
else:
    pose_pkl = config['dataset_args']['poses']
    rot, trans = utils.load_pkl(pose_pkl)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 8
      6 else:
      7     pose_pkl = config['dataset_args']['poses']
----> 8     rot, trans = utils.load_pkl(pose_pkl)

ValueError: too many values to unpack (expected 2)

My understanding is that poses can be rotations, translations, or just translations, and the toy dataset that I'm testing out this notebook on (the hand dataset) just has rotations, no translations. I think the rest of cryodrgn code is written so as to take care of this situation, but perhaps the notebooks are not?

I'm not sure how big an issue this is for the real usage of notebooks. If it is, perhaps we can open up a new issue on this? In any case it is unrelated to this PR so I'm confident that this can be merged, and there are no obvious places that we're overlooking here for bugs.

Ah yes, I think we should open a new ticket for updating the toy datasets. Right now the toy dataset doesn't have any translations (just rotations since the images are perfectly centered). There is also no CTF applied on this dataset. I'm pretty sure this idealized setting breaks some of the assumptions in the notebooks, and we can discuss whether we need to support these idealized settings in the code and in the jupyter notebooks.

Originally posted by @zhonge in https://github.com/zhonge/cryodrgn/issues/240#issuecomment-1484144739

zhonge commented 1 year ago

Summarizing the above discussion on a different pull request. Some cells in the jupyter notebooks assume poses include both rotations and translations, and that there are CTF parameters... which is obviously the case for real datasets, but our toy dataset in testing currently only has rotations and no CTF. The software still runs, but the jupyter notebooks currently assume real data (which just prevents testing at the moment).

I think we should support both idealized and real imaging, and

michal-g commented 4 months ago

Based on #355 we have users who get stuck on trying to run the notebooks after trying training without a CTF — +1 from me for fixing the notebooks to support cases without a CTF, and to clean up and document the test case datasets a bit too, see e.g. #375.