Determining the downsampling factor & radius

ojasvijain commented 5 months ago

Hi,

I wanted to understand how I can determine the correct downsampling factor.

If i am using the pretrained default model as mentioned in the extract documentation - which is resnet16, what should my downscale factor be?
If I run topaz train to train on my own dataset, the default model is resent8. What should be my downscale factor for that? (https://github.com/tbepler/topaz/blob/master/topaz/commands/train.py)

Also, I was wondering how I can set the radius. I am training on different proteins which have different radii.

Is the radius that I need to mention in the train and extract step before or after the downsampling. For instance, if my radius size is 20 and I am downsampling by a factor of 5, should I mention the radius in the train and extract step as 20 or 4?

Thanks!

Guillawme commented 5 months ago

Hello,

The downsampling factor you need depends on three things:

the original pixel size of your micrographs,
the size of the particles you want to pick,
the model you are using.

This is unfortunately very poorly documented. I wrote about this some time ago on the cryoSPARC forum. Copying the relevant part of this message below:

The micrograph downscaling factor (--scale option to topaz preprocess) is one of the most important parameters for a successful training, because a particle must have a certain diameter (or longest dimension) in pixels for the training to work optimally. This is not the same depending on which neural net architecture is used, and this is poorly documented… the best place to find out is the Topaz GUI (actually simply a command builder; you can get it locally from your topaz installation with the command topaz gui), then go to the “Preprocess” section and hover the mouse over the “Scale factor” blue box. The help bubble then says:

Rescaling factor for image downsampling (e.g. a 4k x 4k image downsampled by 4 results in a 1k x 1k image) (type: even integer).

Recommended: Downsample such that the resulting pixelsize is about 8 angstroms; usually downsample by 4, 8, or 16 depending on pixelsize and particle size.

𝗡𝗼𝘁𝗲: Your particle 𝘮𝘶𝘴𝘵 have a diameter (longest dimension) after downsampling of maximum:

70 pixels or less for resnet8 30 pixels or less for conv31 62 pixels or less for conv63 126 pixels or less for conv127

Relion-4 chose to not expose this downscaling factor to the user. Instead, it calculates it automatically from the known pixel size of the micrographs and from the estimated particle diameter in Å input by the user (which is relatively easy to measure with a manual picking job, but typically one has a good sense of the expected particle size after working on the same thing for a while). Relion-4 also chose to not expose the neural net architecture to the user, and always uses resnet8 by default. But it lets one overwrite these defaults by passing options explicitly.

I think this is a really good default, very user friendly. If cryosparc could do the same, that would make setting up topaz trainings much easier.

I hope this helps!

ojasvijain commented 5 months ago

This is really helpful. Although I want to understand - what if I am training on different particles? Should I consider the median diameter?

tbepler commented 5 months ago

Take a look at "A note on downsampling" here.

TLDR; if you're using the pre-trained models, downsample your micrographs to the 4-8 angstrom/pixel range. Very large particles that don't fit in the receptive field of the model at 8 a/pix may need additional downsampling as @Guillawme mentioned.

ojasvijain commented 5 months ago

Thank you!

tbepler / topaz

Determining the downsampling factor & radius #186