Open Guillawme opened 3 weeks ago
I have to say that the csparc's data model is quite messy, and cannot exclude this is done on purpose :/
So a few workarounds to try to remove 3D classification metadata:
1) use the particle stack before 3D classification, cryoDRGN only needs particle images and CTF information, but not the class assignment
2) run csparc restacking job on the output of 3D classification and see if *.cs
files from restacking work
3) convert to Relion STAR files and then import to cryoDRGN
I would also recommend to start with a smaller subset, say 100-200k particles. This way you can try different cryoDRGN settings relatively quick.
Yes, I could also come up with some workarounds (haven't had time to test any of them yet). But regardless, it seems to me like a .cs
file smaller than 1 MB crashing three commands is a bug worth fixing.
Describe the bug With CryoDRGN 3.4.1, all these commands crash when trying to read a
.cs
file with a large header:cryodrgn downsample
cryodrgn parse_pose_csparc
cryodrgn parse_ctf_csparc
To Reproduce Run any of these three commands on a
.cs
file with 100alignments_class3D_*
entries in the header. These entries result from having run a 3D classification with 100 classes earlier in the job graph. They are no longer relevant to the final refinement job, but CryoSPARC keeps them in the particle metadata, and I don't think it provides a way to remove these entries.I don't know what threshold of header size triggers this behavior, and can't test it easily.
Expected behavior This shouldn't cause a crash. The irrelevant header entries should be dropped and the command should complete normally.
Additional context Initially, this happened with a very large
.cs
file (~6.5 GB) because it's a large particle set. I thought the total file size was the problem, so I repeated this on a random subset of 100 000 particles, which should be workable and gave a much smaller.cs
file (~850 MB), but I got the same error.I prepared a
.cs
file with only 100 particles (~950 KB) that also causes the crash because it has a full header. I attached this.cs
file in case this is helpful for debugging (I needed to add a.txt
extension for GitHub to accept to upload it, you might need to rename the file to remove this extension). J156_split_0_exported.cs.txtError for
downsample
:Error for
parse_pose_csparc
:Error for
parse_pose_ctf
: