Latent variable modeling theta

gghanim commented 1 week ago

I have a dataset where cryoDRGN is modeling pose parameters rather than heterogeneity. I am using the recommended workflow parameters suggested by the tutorial. I observe this behavior at different architectures, down samplings and after further classification of the dataset to 250K ptcls. Are there ways around this?

I imagine this can affect the capacity to learn structural heterogeneity. This may be the case because the complex has a flexible domain that I cannot classify with cryodrgn. I can, however, 'find' this domain in one conformation by other methods (3D classification, etc...) in about a third of the particle images (~0.4M/1.2M).

I'm including plots below.

UMAP1 vs. UMAP2 colored by theta. newplot(1)

theta vs phi colored by UMAP1 newplot

Thanks for any help with this!

zhonge commented 1 week ago

Interesting! Do the volumes from the different regions reflect the pose heterogeneity e.g. I think it would look similar to preferred orientation artifacts?

How good do you think the input alignments are / what is the resolution of the consensus reconstruction?

gghanim commented 1 week ago

The maps from these different regions look isotropic. I do not see any artifacts at reasonable thresholds. I think the input alignments are very good. I have tried on particle stacks that refine to 2.2 - 3Å, depending on the subset.

ml-struct-bio / cryodrgn

Latent variable modeling theta #384