nanovis / DiffFit

DiffFit: Visually-Guided Differentiable Fitting of Molecule Structures to Cryo-EM Map
Other
6 stars 0 forks source link

Volume matrix accessing and modifying #16

Open RodenLuo opened 6 months ago

RodenLuo commented 6 months ago

Running the following code in Tools > General > Shell, I can access the numpy matrix of a volume opened in ChimeraX; how can I access the filtered (by the "Level") matrix?

from chimerax.map.volume import volume_list
v = volume_list(session)[0]
v.matrix()

If I change the v.matrix() through the Shell interactively, how can I get a new drawing? I tried the following, but it is not working.

v.matrix().sum()
Out[62]: -3153.0344
v.matrix()[v.matrix() < 0.1] = 0
v.matrix().sum()
Out[64]: 4556.745
v.update_drawings()

Is it possible to access the matrix after dust removal, as in the following command?

surface dust #2 size 8.1

RodenLuo commented 6 months ago

The following reply comes from the ChimeraX developer Tom Goddard

There is no API call to get the filtered by level matrix for a volume. This is trivial to do with numpy as you show in your example code

masked_matrix = matrix[matrix >= level]

You should not change the values of a volume matrix unless you have asked for a writable (in-memory) copy

volume_copy = v.writable_copy()

The reason is that ChimeraX is reading the volume values from a file on disk and caches those values. If you change the numpy array you are changing the cached copy and it may be overwritten later. Why would it get overwritten? If you open a large volume file it may initially display at step 2 (displaying every other grid point) so it can display it quickly without reading the whole file. Then you may switch to step 1 and it will re-read the file to read the full resolution data. The cached volume values will be updated. In fact if you are not displaying the volume, all the cached data may be flushed if you have many maps open and are low on memory. So use a writable copy which reads all the data into memory and will never get overwritten by rereading a file. If you change the writable copy numpy matrix you have to tell the Volume instance it has changed -- otherwise it has no way to know you changed the data values. The call you make after changing the values is

v.matrix_changed()

If you look at the ChimeraX Map Eraser tool code you will see all these details.

Hiding dust does not change the volume matrix it only hides parts of the surface. This is described in the surface dust documentation. You can mask away the dust but I don't recommend that since setting just the dust to 0 while surround voxels are non-zero creates very ugly artifacts in the map, looks horrible if you then lower the displayed threshold, and can mess up algorithmic code.

RodenLuo commented 6 months ago

The only outstanding from this thread is what Tom thinks if we perform a similar dust removal by filtering out the voxels instead of the surface.

We have created the functionfilter_volume for this:

https://github.com/nanovis/DiffFitViewer/blob/6db199d16a7d5551c97df0669eec0e30e9f12f68/src/DiffAtomComp.py#L416

It first identifies the connected voxels. Let's call them islands. The islands are then filtered by the number of containing voxels. The only issue I see is we need to let the user understand the relationship between the number of voxels and the actual size in Angstroms, which involves the grid spacing attribute for that volume.

tomgoddard commented 6 months ago

From the previous comment: "The only outstanding from this thread is what Tom thinks if we perform a similar dust removal by filtering out the voxels instead of the surface."

What is the purpose of doing anything to the dust? Is it to avoid placing a structure to fit on the dust? I also don't know what you mean by "filtering out the voxels". Does that mean changing the map values to 0? Or does it just mean removing them from the list of voxels where you will consider placing a structure for fitting?

RodenLuo commented 6 months ago

It originates from the need to avoid initial placements on the dust. My intention is to set the map value to 0. I understand it then has a side effect for the fitting, but I am not sure whether it is a positive or negative effect and how strong or severe that effect is. 

tomgoddard commented 6 months ago

I would not expect setting map values to zero to effect the DiffFit fitting, although I guess it could if the dust is within parts of the structure -- that happens if the contour level is set high. The problem is you are creating a messed up map pockmarked with zeros, and if the user ends up doing things with that map (like changing the display threshold) they are going to be confused. But if it is just internally you are doing this in DiffFit and not visible to the user because that is the simplest way to make your algorithm sample positions correctly, then that seems ok. In general I avoid making copies of maps in ChimeraX unless it is necessary since it reduces performance (slow and uses of memory on very large maps).

I don't understand the details of why you want to avoid initial placements on dust. If it is a tiny volume of dust then almost no placements will go there and of course they won't fit well so will not score high -- that would note effect the speed. If there is a lot of dust then the user perhaps didn't realize they need to set a reasonable threshold level to define the envelope of the structure. In that case the level may be so low that it is not just dust but huge blobs that won't be filtered out as dust anyways.

RodenLuo commented 6 months ago

You are right. I was a bit too perfectionism. The effect on the placement initialization is neglectable.

The other consideration might also be too perfectionism. It's about the smoothed volume array. Currently, DiffFit only thresholds the original volume and doesn't threshold the smoothed ones. I don't know how the dust will behave after iterative/progressive smoothing and how that affects the fitting. For single proteins fitting into entire maps, this does not matter much. For compositing or searching from a large pool, I am now thinking of allowing the user to interactively create and examine the volumes in the array.

Anyhow, as it is already in place, I think it will be safe for me to keep this option and set the dust to, say, 3 voxels as the default.

RodenLuo commented 5 months ago

In an interactive shell session, after calling matrix_changed(), the drawing in the viewport is changed immediately, but the intensity histogram plot in the Volume Viewer will only reflect the change after hiding and unhiding the corresponding volume.

from chimerax.map.volume import volume_list
vol = volume_list(session)[0]
vol_copy = vol.writable_copy()
vol_copy_matrix = vol_copy.data.matrix()
vol_copy_matrix[vol_copy_matrix < vol.maximum_surface_level] = 0
vol_copy.matrix_changed() 
tomgoddard commented 5 months ago

When I use the ChimeraX "Map Eraser" tool (menu Tools / Volume Data / Map Eraser) it creates a copy of the map and the histogram updates automatically about a second after part of the map is changed. I think there is a delay because in general the volume viewer panel delays updates because they slow down interaction in some circumstances. The histogram update was hard to notice unless I erased a largish part of the map. I don't know if the Map Eraser tool is doing some other call besides matrix_changed() to get the update. You might look at its code if your case is not working as expected. If you have trouble figuring it out from the working Map Eraser code I can look.

RodenLuo commented 5 months ago

Thanks, Tom! I was more logging to take a note. But your input did help to figure out that calling vol_copy.data.values_changed() updates both the viewport and the histogram. (For the histogram only, vol_copy.call_change_callbacks('data values changed') can invoke tp.update_histograms(v) which ultimately updates the histogram.)

from chimerax.map.volume import volume_list
vol = volume_list(session)[0]
vol_copy = vol.writable_copy()
vol_copy_matrix = vol_copy.data.matrix()
vol_copy_matrix[vol_copy_matrix < vol.maximum_surface_level] = 0
vol_copy.data.values_changed()
RodenLuo commented 5 months ago
from chimerax.map.volume_viewer import VolumeViewer
session.tools.find_by_class(VolumeViewer)[0].thresholds_panel.update_histograms(vol_copy)

The above can also update the histogram solely. It is a bit awkward. But it demonstrates how to refer to a tool's instance via the session variable. The other handy function is session.tools.list().

tomgoddard commented 5 months ago

The right approach is calling

volume.data.values_changed()

That triggers callbacks that any tool such as the volume viewer gui can listen to to update themselves. You would think that volume.matrix_changed() would do this but looking at the code that only effects redrawing the volume in the graphics.

I would suggest not getting the VolumeViewer instance and manually updating histograms, or using any of its methods since none of those are public APIs and if you use them a future change to VolumeViewer may break your code.