ml-struct-bio / cryodrgn

Neural networks for cryo-EM reconstruction
http://cryodrgn.cs.princeton.edu
GNU General Public License v3.0
307 stars 76 forks source link

Zero-pad epoch numbers in output file names #298

Open Guillawme opened 1 year ago

Guillawme commented 1 year ago

Describe the bug This is not a bug report, but a request for a small improvement in user experience.

To Reproduce Run cryodrgn train_vae or any other reconstruction (abinit_het, etc.). The output directory contains files weights.*.pkl, z.*.pkl and sometimes pose.*.pkl, with * being the epoch number. But this number is not zero-padded, so listing files results in epoch numbers being out of order:

$ ls -1
analyze.29
config.yaml
run.log
weights.0.pkl
weights.10.pkl
weights.11.pkl
weights.12.pkl
weights.13.pkl
weights.14.pkl
weights.15.pkl
weights.16.pkl
weights.17.pkl
weights.18.pkl
weights.19.pkl
weights.1.pkl
weights.20.pkl
weights.21.pkl
weights.22.pkl
weights.23.pkl
weights.24.pkl
weights.25.pkl
weights.26.pkl
weights.27.pkl
weights.28.pkl
weights.29.pkl
weights.2.pkl
weights.3.pkl
weights.4.pkl
weights.5.pkl
weights.6.pkl
weights.7.pkl
weights.8.pkl
weights.9.pkl
weights.pkl
z.0.pkl
z.10.pkl
z.11.pkl
z.12.pkl
z.13.pkl
z.14.pkl
z.15.pkl
z.16.pkl
z.17.pkl
z.18.pkl
z.19.pkl
z.1.pkl
z.20.pkl
z.21.pkl
z.22.pkl
z.23.pkl
z.24.pkl
z.25.pkl
z.26.pkl
z.27.pkl
z.28.pkl
z.29.pkl
z.2.pkl
z.3.pkl
z.4.pkl
z.5.pkl
z.6.pkl
z.7.pkl
z.8.pkl
z.9.pkl
z.pkl

Expected behavior Navigating the output files would be easier if epochs 0-9 were zero-padded to 2 digits (or 3? would anybody ever run more than 99 epochs?), giving 00-09. With this numbering, output files would sort more naturally when listed:

$ ls -1
analyze.29
config.yaml
run.log
weights.00.pkl
weights.01.pkl
weights.02.pkl
weights.03.pkl
weights.04.pkl
weights.05.pkl
weights.06.pkl
weights.07.pkl
weights.08.pkl
weights.09.pkl
weights.10.pkl
weights.11.pkl
weights.12.pkl
weights.13.pkl
weights.14.pkl
weights.15.pkl
weights.16.pkl
weights.17.pkl
weights.18.pkl
weights.19.pkl
weights.20.pkl
weights.21.pkl
weights.22.pkl
weights.23.pkl
weights.24.pkl
weights.25.pkl
weights.26.pkl
weights.27.pkl
weights.28.pkl
weights.29.pkl
weights.pkl
z.00.pkl
z.01.pkl
z.02.pkl
z.03.pkl
z.04.pkl
z.05.pkl
z.06.pkl
z.07.pkl
z.08.pkl
z.09.pkl
z.10.pkl
z.11.pkl
z.12.pkl
z.13.pkl
z.14.pkl
z.15.pkl
z.16.pkl
z.17.pkl
z.18.pkl
z.19.pkl
z.20.pkl
z.21.pkl
z.22.pkl
z.23.pkl
z.24.pkl
z.25.pkl
z.26.pkl
z.27.pkl
z.28.pkl
z.29.pkl
z.pkl

Additional context Output files from cryodrgn analyze and other analysis commands already have zero-padded indices (for example the volume files vol_012.mrc, etc.).

michal-g commented 7 months ago

Related to #151 — would there be support for making the output epoch numbers start at 1 instead of 0 on top of this? That way epoch "0" could be reserved e.g. for pre-training, and the numbers would be more intuitive, with the last epoch having the same number as the number of epochs.

Either way, zero-padding will be added in an upcoming release!

Guillawme commented 7 months ago

I agree: it would also be convenient as a user if the final epoch number matched the total number of epochs requested in the job (this kind of offset absolutely always trips me up when I look at results).

michal-g commented 1 month ago

We are still planning to update the epoch numbering along these lines for v4.0.0, which is due out by the end of the summer; in the meantime, it's occurred to me that you can also use sort -t \. -k 2 -g to get the proper ordering in the situation @Guillawme discussed above:

ls -1
reconstruct.-1.mrc
reconstruct.0.mrc
reconstruct.1.mrc
reconstruct.10.mrc
reconstruct.11.mrc
reconstruct.2.mrc
reconstruct.3.mrc
reconstruct.4.mrc
reconstruct.5.mrc
reconstruct.6.mrc
reconstruct.7.mrc
reconstruct.8.mrc
reconstruct.9.mrc
ls | sort -t \. -k 2 -g
reconstruct.-1.mrc
reconstruct.0.mrc
reconstruct.1.mrc
reconstruct.2.mrc
reconstruct.3.mrc
reconstruct.4.mrc
reconstruct.5.mrc
reconstruct.6.mrc
reconstruct.7.mrc
reconstruct.8.mrc
reconstruct.9.mrc
reconstruct.10.mrc
reconstruct.11.mrc