ml-struct-bio / cryodrgn

Neural networks for cryo-EM reconstruction
http://cryodrgn.cs.princeton.edu
GNU General Public License v3.0
316 stars 76 forks source link

1-based indexing of output volumes instead of 0-based #151

Open Guillawme opened 2 years ago

Guillawme commented 2 years ago

Describe the bug This is not a bug report, only a proposal for a little improvement in user experience.

CryoDRGN numbers all the maps it generates from 0, but ChimeraX numbers all the maps it opens from 1, and this trips me up every single time I look at maps from cryoDRGN.

To Reproduce

cd /path/to/kmeans20
chimerax vol_???.mrc

Then, vol_000.mrc has model ID 1 in ChimeraX, vol_001.mrc has model ID 2, and so on. I always look at the model ID column in the model panel in ChimeraX, but the number in this column doesn't match the number in the UMAP plot.

Expected behavior It would be a lot easier if the numbering from cryoDRGN started at vol_001.mrc. Or if ChimeraX numbered its open models from 0, if you can convince them that they should be the ones changing their software.

Additional context It is definitely possible to look at the filename column, where the correct vol_???.mrc are listed, instead of the model ID column. But it is confusing when one got into the habit of ignoring the filename column (I got into this habit because output files from two RELION jobs have the same names, so in this situation only the model ID column is informative to remember which file comes from which job).

zhonge commented 2 years ago

Maybe we could have a flag in cryodrgn analyze for the start index of the volume numbering.

vineetbansal commented 2 years ago

@Guillawme, @zhonge - ok, so I'm seeing 3 places where volumes are generated by cryoDRGN and this option (start index for volume numbering) would take effect:

cryodrgn analyze command:

cryodrgn eval_vol command:

Let me know if there are any more.

EDIT - I'm seeing one more place where this functionality would need to be added to keep things consistent: cryodrgn analyze_landscape command - this is essentially similar to analyze in that it generates many volumes in kMeans<k> and pc<1> to pc<N> folders.

Guillawme commented 2 years ago

This is all I can think of too.

Now another question: since this is becoming an option, what should be the default value? 0, to keep behavior consistent with previous versions of cryoDRGN? Or 1, to reduce friction by default?

vineetbansal commented 2 years ago

I'd say we keep it at 0 for now. I'll add the rationale for this flag in the documentation so you (and other users like you who're using chimera) and benefit from it by overriding it. At some future point we can modify the default to be 1.

vineetbansal commented 1 year ago

@Guillawme - we added a --vol-start-index flag (default value 0) to cryodrgn analyze command. Can you try this out and see if it addresses your use case? If so, I'll close this issue.

Guillawme commented 1 year ago

Hello!

Very sorry it took me so long to get back to this.

I have finally tested it, and it works nicely. This is so much easier to read now:

Screenshot from 2023-07-06 14-09-14

I think it will be beneficial if the default value becomes 1 in a future version.

Guillawme commented 1 year ago

Re-opening this issue because the numbering is still off in the UMAP and PCA plots found in the kmeans directory after running cryodrgn analyze. This is apparent in the very small cluster on the left in this UMAP plot:

umap

And also in this PCA plot:

z_pca

While the volumes from this job start at vol_001.mrc because I used --vol-start-index 1.