mjo22 / cryojax

Cryo electron microscopy image simulation and analysis built on JAX.
https://mjo22.github.io/cryojax/
GNU Lesser General Public License v2.1
30 stars 13 forks source link

Modeling the specimen phase shift distribution #167

Closed mjo22 closed 6 months ago

mjo22 commented 8 months ago

Need to make sure that rasterized voxel grid scattering potentials have necessary factors out front so that their 2D projection images represent phase shifts in the exit plane. See function ReturnScatteringPotentialOfAVoxel here: https://github.com/timothygrant80/cisTEM/blob/master/src/core/scattering_potential.h

Also, there is a question in my mind of whether or not we should be using the error function to evaluate voxel grids at a given pixel. For reasons, see here: https://journals.iucr.org/m/issues/2021/06/00/rq5007/index.html

ehthiede commented 8 months ago

I feel like there's multiple factors missing. That said, it's a bit tricky to make sure we have them all without going into the math you are using for the detector model (I'm afraid I haven't been staying very caught up there). Moreover I'm afraid the linked CisTEM code wasn't helpful for me (maybe I was looking in the wrong places). Do you have a mathematical reference for your model you could share? If it helps to have something to refer to, I've been looking at Chapter 4.3 of the IUCr International Tables for Crystallography, Volume C, Chapter 4.3.

The error function is an interesting question. I think in the end it comes down to how finely we need to parameterize our volumes (in real space) due to other factors, e.g. to avoid spurious correlations as a function of rotation. I feel like we should get some numerical tests on the effect of resolution before we commit one way or the other.

mjo22 commented 8 months ago

The reference used for the cisTEM scattering potential seems to be the appendix of Vulovic et al 2013 (https://pubmed.ncbi.nlm.nih.gov/23711417/). I believe that the object we are after is $\sigma V_z(x, y, z)$, given in Eq. 22. We should carefully work this out...

Re, the error function and resolution: The reason it is important for cisTEM may be for the purpose of high-resolution 2D template matching. If this is true, this may become important for me. But as you say, I think it makes sense to make a small correction to the current code, run some tests, and then build from there!

Regardless, a parameter we may need to include is some kind of downsample factor. Namely, let's say we ask for a voxel size of 1.0 A but specify a downsample factor of 2. Then, create a voxel grid at 0.5 A and then downsample to 1.0 A. People seem to do this regardless of if they use the error function or a gaussian.

jkawamura8 commented 7 months ago

Hello, Kawamura here.

Just looking into this.

geoffwoollard commented 7 months ago

Regarding using the error function(s) instead a gaussian, I have looked into this numerically.

It's related to how small the pixel size is versus the decay length of the Gaussian. Think area-under-the-curve from integral calculus.

The Gaussian drops off to exp(-x^2/2) when we are x standard deviations away. That's exp(-4.5)=1% at 3 std!

So if the pixel size is too large, then the Gaussian decays to near zero. Hence we need to integrate the Gaussian density inside the pixel. Hence the (difference of) error function(s). You get a difference in the x and y direction, so err_1_x - err_2_x and err_1_y - err_2_y, so 4 error function evaluations.

So either you have very tiny pixels and

  1. evaluate one gaussian, or
  2. larger pixel size, and evaluate 4 error functions / gaussian
geoffwoollard commented 7 months ago

Here are some notes on Cowley and Peng https://github.com/mjo22/cryojax/issues/179

geoffwoollard commented 7 months ago

@jkawamura8 @ehthiede and @mjo22 After a few hours of making notes on the refs in #179, I have a concrete procedure to getting the units of a_i and b_i right:

  1. multiply s^2 to b_i with the_right_factors.

  2. sum up f_atom = sum_{i=1}^5 a_i exp(the_right_factors*b_i*s^2) to get f^B for the right atom.

  3. The potential distribution is defined from 2. in equation 4.3.1.32. This equation says the Fourier transform of the real spaced potential is \sum_atom f_atom exp(2 pi i h r_atom).

  4. Consider the function evaluate_3d_real_space_gaussian:https://github.com/mjo22/cryojax/blob/df1407634d546574fdac242e596eb2077cd29cbb/src/cryojax/simulator/_potential/_voxel_potential.py#L564 https://github.com/mjo22/cryojax/blob/df1407634d546574fdac242e596eb2077cd29cbb/src/cryojax/simulator/_potential/_voxel_potential.py#L589-L593 It takes the real spaced Fourier transform of 3. The factors needs to match with a. The convention of the Fourier transform used b. How you Fourier transform a Gaussian.

  5. The absolute value of a_i factors can be determined by matching to some standard, as in Himes and Grigorieff (2021) Figures 2,3,4 (see below).

  6. A suggested test for evaluate_3d_real_space_gaussian is to take its Fourier transform (numerically, with the convention in cryojax) and then compare this to a hard coded 4.3.1.32 for a few atoms, with values lifted from table 4.3.1.32 at a specific resolution bin, i.e. sin(theta) or as a Gaussian in Fourier space from the gemmi (a_i,b_i) values.

    Screen Shot 2024-03-14 at 4 28 20 PM
Screen Shot 2024-03-14 at 4 28 24 PM Screen Shot 2024-03-14 at 4 28 29 PM
mjo22 commented 7 months ago

I was reading more on this. To be clear, this issue is about implementing the right physical model of the object phase shift distribution, given its potential. Section 69.2.3 in the Hawkes and Casper book (see here) describes exactly this. Renaming this issue appropriately.

Usually in cryo-EM scattering theory, the phase shift distribution is taken to be proportional to the projected potential up to dimensional factors. This is equation 69.34(b) in the book and is exactly what cisTEM and other cryo-EM simulators use. We should analyze if this is indeed the right approximation to make.

There are two things to do to I think.

  1. The equation typically used is a simple approximation derived from what is called the eikonal approximation. The full phase shift distribution in the eikonal approximation is very easy to compute. Should we be doing this instead? It would be helpful to do a back-of-the-envelope calculation for the approximation made between equations 69.34(a) and 39.34(b). I expect that we are well into the regime where it is a good simplification, but it would be good to work it out.
  2. In the book, they discuss that this approximation typically used in cryo-EM has aphysical aspects to it. Namely, if we were to use a potential that diverges at the origin (e.g. a Yukawa potential), we will get a divergence in the phase shift distribution. This of course cannot happen. They describe a simple way of removing the singularity, culminating in equation 69.35. Should we be using their singularity-corrected expression for the phase shifts? I think this actually may come up, depending on what scattering factors we use! For example, the parameterization in the following paper diverges at the origin: https://scripts.iucr.org/cgi-bin/paper?mq5024).
mjo22 commented 7 months ago

In PR #200 I've worked out a refactor that should allow us to implement a fix to this issue!

The big idea of it is that we need to make the wavelength a parameter visible to many parts of the modeling pipeline (see #196 for some discussion on this). This involved moving the wavelength upstream to the Instrument.

mjo22 commented 6 months ago

216 closes this issue