nanovis / DiffFit

DiffFit: Visually-Guided Differentiable Fitting of Molecule Structures to Cryo-EM Map
Other
6 stars 0 forks source link

Zeroing out the already fitted density | Find the API for `subtracting the density` #4

Closed RodenLuo closed 4 months ago

RodenLuo commented 7 months ago

"subtracting the density" in https://www.cgl.ucsf.edu/chimerax/docs/user/commands/fitmap.html#sequence

Code starts here:

https://github.com/RBVI/ChimeraX/blob/26c993ba96c5456a6f3468e2099cd8f0397f1152/src/bundles/map_fit/src/fitcmd.py#L47

RodenLuo commented 7 months ago

ChimeraX is subtracting rather than zeroing out.

https://www.cgl.ucsf.edu/chimerax/docs/user/commands/volume.html#subtract

https://github.com/RBVI/ChimeraX/blob/26c993ba96c5456a6f3468e2099cd8f0397f1152/src/bundles/map_fit/src/sequence.py#L51-L62

Need to implement zeroing out by ourselves.

RodenLuo commented 7 months ago

Useful APIs:

from chimerax.map.molmap import molecule_map
v = molecule_map(session, atoms, res)
v.display = False

from chimerax.map.volume import volume_list
vlist = volume_list(session)

from . import fitmap as F
v = F.simulated_map(atoms, resolution, session)

from .fitmap import simulated_map
vlist.append(simulated_map(matoms, resolution, session))
RodenLuo commented 7 months ago

Partially solved in 108ea62e76cfc6a1f7c0665f1f580d1332fbd510

Need to see how to put the new numpy array to the volume.

Check what is behind surface dust #2 size 8.1

https://www.cgl.ucsf.edu/chimerax/docs/user/commands/surface.html#dust

RodenLuo commented 7 months ago

Solved. API:

zero_cluster_density(vol, e_sqd_clusters_ordered, mol_folder, cluster_idx, session, res=4.0, zero_iter=0)

# TODO: res should be specified by the user for once
# TODO: zero_iter should go up in later iterations, this is only for better log/info, does not matter the computation

With in the following function, there is another # TODO: Manually change the surface threshold

https://github.com/RodenLuo/DiffFitViewer/blob/0bfb4d0292319603b811a64deb545c658ecc32ac/src/parse_log.py#L147-L192

RodenLuo commented 7 months ago

Solved by ChimeraX command: volume subtract #1 #4 scaleFactors 1.0,100.0

RodenLuo commented 6 months ago

Say I have EMD-29900 open in ChimeraX and run the following Python code in the interactive shell, I will get the output as printed.

from chimerax.map.volume import volume_list
v = volume_list(session)[0]

v.matrix().shape
Out[44]: (140, 140, 140)

v.data_origin_and_step()
Out[45]:
(array([0., 0., 0.]),
 [2.8113998413085937, 2.8113998413085937, 2.8113998413085937])

v.region
Out[46]: ((0, 0, 0), (279, 279, 279), [2, 2, 2])

v_smooth = run(session, "volume gaussian #1 sdev 2")

v_smooth.matrix().shape
Out[48]: (280, 280, 280)

v_smooth.data_origin_and_step()
Out[49]:
(array([0., 0., 0.]),
 [1.4056999206542968, 1.4056999206542968, 1.4056999206542968])

v_smooth.region
Out[50]: ((0, 0, 0), (279, 279, 279), [1, 1, 1])

My understanding is that each voxel in v is of size 2.81 Angstroms on each dimension, while in v_smooth, it is 1.405. What are the benefits of having a v.region[2][0] bigger than 1, except from data compression?

I once tried zeroing out the densities for the voxels occupied by fitted atoms by simulated maps. The pipeline worked well for some cases but then did not behave as expected in a few cases, I noticed for those cases, v.region[2][0] are all bigger than 1. (Code at https://github.com/nanovis/DiffFitViewer/blob/8e7d2db992c156a0a962ad82c0dcee8ea7664fee/src/parse_log.py#L167-L193, briefly, the simulated map mol_vol is thresholded to get eligible_indices, which are than converted to xyz, which are then converted to indices in relation to the target vol, which are then used to zero the target voxels. If I do not force step = [1, 1, 1] at Line 183, I got dim not match error at Line 185. For cases when this pipeline works, I have to set the simulated map's resolution about 1/3 to the original resolution.)

I then use the following command to achieve the zeroing. Please kindly comment.

volume subtract #{vol.id[0]} #{mol_vol.id[0]} scaleFactors 1.0,1000.0

Then set the voxels with density values smaller than 0 to 0

Full code:

https://github.com/nanovis/DiffFitViewer/blob/6db199d16a7d5551c97df0669eec0e30e9f12f68/src/parse_log.py#L185-L193

RodenLuo commented 6 months ago

The following reply comes from the ChimeraX developer Tom Goddard

The volume gaussian command does not change the voxel size. None of the ChimeraX volume operations change the voxel size unless you give it options to explicitly do that. The step 2, step 4, ... is to allow handling large multi-gigabyte datasets interatively without having the user wait a long time for the data to load from a slow disk (was more important before SSD drives), and not have to wait for surface calculations on very large data.

I'd suggest looking that the fitmap command sequential fitting code if you want to see how to subtract already fit molecules from a map -- it does exactly that. The main trick is to get the right intensity scaling of the simulated map values so they match the experimental map and there are options for that, the minRMS option of the volume subtract command as described in the documentation.

https://www.cgl.ucsf.edu/chimerax/docs/user/commands/volume.html#subtract

RodenLuo commented 5 months ago

The ChimeraX code that is responsible for subtracting in sequential fitting is here, cropped below.

                scale = minimum_rms_scale(values, data_array.ravel(), level)
                multiply(values, scale, values)
                add(d, values.reshape(d.shape), d)

volume subtract #1 #2 minRms true drills into this line.

Essentially, both of them use def minimum_rms_scale(v, u, level). Then, I should be able to just use the command, volume subtract #1 #2 minRms true.

RodenLuo commented 5 months ago

Hi Tom @tomgoddard, I really could not get the volume subtracted in an expected way. Could you please help here? The following command should give all the necessary context. Thanks!

open 8GAM
open 29900 from emdb
volume #2 level 0.2
molmap #1 3.46 gridSpacing 1.41; # emd-29900 is with resolution as 3.46, pixel as 1.41; I also tried without specifying the gridSpacing value
volume #3 level 0.2
volume subtract #2 #3 minRms true

Below shows in sequence volumes #2, #3, #4 and the Models and the Volume Viewer panels. I expect a near-empty volume for #4.

image image image image

tomgoddard commented 5 months ago

This is working as expected and is a problem primarily with the EMDB map. Most of the side chains of the atomic model are sticking completely out of the density as if they masked the map too tightly. Also there is density with no atoms. In general the molmap simulated map by placing Gaussians at atoms is not going to match very well even to an EMDB map without special problems. The mismatch is for many reasons. A typical EM map has highly variable resolution, so the molmap using a single resolution can't possibly match it well. Side chain density is often missing due to disorder for many residues. At any rate, this example is an especially bad case.

8gam_emdb_29900
RodenLuo commented 5 months ago

Thanks, Tom! I adapted the subtracting and formed a new way, zeroing out. Briefly, the voxel positions from the original map are used to sample the density values from the simulated map; if the sampled density is higher than a user-specified threshold, the original voxel's density is set to zero. The threshold is defaulted to 0. The result of using 0, 0.001, 0.01, 0.1 as the threshold for the above example is shown below.

The responsible code is:

https://github.com/nanovis/DiffFitViewer/blob/23684810355b4e6f1e2ad8b17ea84d0c82aaf40e/src/parse_log.py#L191-L199

Using 0 as the threshold:

image image

image

Using 0.001 as the threshold:

image image

image

Using 0.01 as the threshold:

image image

image

Using 0.1 as the threshold:

image image

image

tomgoddard commented 5 months ago

Yeah, the problem with masking to replace the map values with zero inside the simulated map is that the experimental map will extend in places outside the simulated map and those don't get masked. So you end up with a chopped up map that is zero in some places and is full intensity right next to those zeros. So I don't think your solution fixes the basic issue that if you don't have a simulated map that matches the experimental map then subtracting gives a poor result. You might be able to do a better job by making a lower resolution simulated map so the side-chains aren't visible and then masking to that so you don't turn the experimental map into swiss cheese.