tbepler / topaz

Pipeline for particle picking in cryo-electron microscopy images using convolutional neural networks trained from positive and unlabeled examples. Also featuring micrograph and tomogram denoising with DNNs.
GNU General Public License v3.0
170 stars 62 forks source link

Determining the downsampling factor & radius #186

Closed ojasvijain closed 5 months ago

ojasvijain commented 5 months ago

Hi,

I wanted to understand how I can determine the correct downsampling factor.

Also, I was wondering how I can set the radius. I am training on different proteins which have different radii.

Thanks!

Guillawme commented 5 months ago

Hello,

The downsampling factor you need depends on three things:

This is unfortunately very poorly documented. I wrote about this some time ago on the cryoSPARC forum. Copying the relevant part of this message below:

The micrograph downscaling factor (--scale option to topaz preprocess) is one of the most important parameters for a successful training, because a particle must have a certain diameter (or longest dimension) in pixels for the training to work optimally. This is not the same depending on which neural net architecture is used, and this is poorly documented… the best place to find out is the Topaz GUI (actually simply a command builder; you can get it locally from your topaz installation with the command topaz gui), then go to the “Preprocess” section and hover the mouse over the “Scale factor” blue box. The help bubble then says:

Rescaling factor for image downsampling (e.g. a 4k x 4k image downsampled by 4 results in a 1k x 1k image) (type: even integer).

Recommended: Downsample such that the resulting pixelsize is about 8 angstroms; usually downsample by 4, 8, or 16 depending on pixelsize and particle size.

𝗡𝗼𝘁𝗲: Your particle 𝘮𝘶𝘴𝘵 have a diameter (longest dimension) after downsampling of maximum:

70 pixels or less for resnet8 30 pixels or less for conv31 62 pixels or less for conv63 126 pixels or less for conv127

Relion-4 chose to not expose this downscaling factor to the user. Instead, it calculates it automatically from the known pixel size of the micrographs and from the estimated particle diameter in Å input by the user (which is relatively easy to measure with a manual picking job, but typically one has a good sense of the expected particle size after working on the same thing for a while). Relion-4 also chose to not expose the neural net architecture to the user, and always uses resnet8 by default. But it lets one overwrite these defaults by passing options explicitly.

I think this is a really good default, very user friendly. If cryosparc could do the same, that would make setting up topaz trainings much easier.

I hope this helps!

ojasvijain commented 5 months ago

This is really helpful. Although I want to understand - what if I am training on different particles? Should I consider the median diameter?

tbepler commented 5 months ago

Take a look at "A note on downsampling" here.

TLDR; if you're using the pre-trained models, downsample your micrographs to the 4-8 angstrom/pixel range. Very large particles that don't fit in the receptive field of the model at 8 a/pix may need additional downsampling as @Guillawme mentioned.

ojasvijain commented 5 months ago

Thank you!