ml-struct-bio / cryodrgn

Neural networks for cryo-EM reconstruction
http://cryodrgn.cs.princeton.edu
GNU General Public License v3.0
307 stars 76 forks source link

Ordering of the star file matters ?! #387

Open Gabriel-Ducrocq opened 1 month ago

Gabriel-Ducrocq commented 1 month ago

Hello,

My lab ran experiments and obtained a cryoEM dataset. When trying to run cryoDRGN, I obtained pure noise. I tried to run the back projection algorithm and also obtained pure noise, whether I used --uninvert_data or not. However, I noticed that if I follow the procedure described here step by step (parsing the star file for the poses, then for the ctf and finally running back projection/cryoDRGN), the index of the image given in the star file is discarded. In the source code, I cannot find where this information is given to cryoDRGN (I could not find it in the ImageDataSet object, nor in the scripts for parsing the poses and the ctf). The means that if the images in my mrcs file are not in the same order as my star file, I loose the ctf and pose informations. To debug, I reordered my star file so that image indexes run from 0 to 200 000, and recovered meaningful volumes, with both back projection and cryoDRGN.

Prior to using cryoDRGN, the dataset has been processed with cryoSparc. This provided a single mrc file with 200k images and we converted the cs file into a star file (in order to use it with different softwares) using pyem.

I am joining the results of the back propagation algorithm on the unordered (yellow) and order (grey) star file. The mrc file containing the images remains unchanged. I applied no downsampling and the images are 400x400 pixels, with an apix 0.828Å.

I have used cryoDRGN on EMPIAR datasets before and never experience such a problem...

Am I missing something ?

Thank you, Gabriel.

unordered ordered
zhonge commented 1 month ago

Can you tell us your commands? Unless your dataset is all in a single continuous .mrcs file, the input to cryodrgn backproject_voxel, cryodrgn train_vae, etc. should be the .star or .cs file describing your particle stack. The preprocessing steps to parse CTF and pose information only extracts these fields for the particles listed in the .star file, and the input to cryodrgn has to be the particles in the same order.

Side note: I would recommend downsampling your images from D=400 to D=128 before trying cryodrgn.