project-gemmi / gemmi

macromolecular crystallography library and utilities
https://project-gemmi.github.io/
Mozilla Public License 2.0
220 stars 44 forks source link

Interpolating many points in Python #323

Closed jrobsontull closed 1 week ago

jrobsontull commented 3 months ago

We rely on Gemmi for a lot of operations involving cryo-EM and crystallography maps and often have to interpolate many points fast. We are happy with the interpolate_value but we are confused about usage of the interpolate_values function. We often have a numpy array of 3D coordinates that we want to interpolate values for and a slow approach of doing this would be:

import gemmi
import numpy as np

dmap = gemmi.read_ccp4_map('map.mrc')
coords = np.array([[1.0, 1.1, 1.2], [2.0, 2.1, 2.2]])
interpolated = [dmap.grid.interpolate_value(gemmi.Position(*coord)) for coord in coords]

We couldn't understand how to use interpolate_values in this context for faster interpolation of an array of points. These points represent positions of atoms in a structure. My understanding is that interpolate_values will modify the input array to interpolate points at each position of the input: https://github.com/project-gemmi/gemmi/blob/fd70dfb1e0ed39f559964630b3c5f099320856b9/python/grid.cpp#L167 However, we don't neccessarily want to interpolate over a grid that is the same shape as the map grid. Rather, we only want to interpolate an array of atomic coordinates. Is there a way to do this with interpolate_values?

Instead we ended up writing our own interpolation function to transform coordinates into the correct system but this doesn't handle NCS and crystallographic symmetry, like Gemmi's interpolate_value does:

import gemmi
import numpy as np

# Use numpy broadcasting to transform the points and interpolate all the values
def interpolate_values(arr: np.ndarray, trans_mat: np.ndarray) -> np.ndarray:
  # do work...
  return arr

# Transformation matrix for moving to different coordinate systems e.g. fractionalization
trans_mat = np.array([
  [1.0, 0.0, 0.0],
  [0.0, 1.0, 0.0],
  [0.0, 0.0, 1.0]
])

dmap = gemmi.read_ccp4_map('map.mrc')
coords = np.array([[1.0, 1.1, 1.2], [2.0, 2.1, 2.2]])
interpolated = interpolate_values(coords, trans_mat)

However, we're reluctant to build all the logic needed for crystallographic and non-crystallographic symmetry. We would love to use Gemmi more throughout our pipelines and were wondering if there is already a good solution. Also happy to look into opening a pull request if this doesn't exist.

wojdyr commented 1 week ago

I just added function interpolate_position_array, equivalent to the function from your PR #324 but in nanobind. Switching to nanobind decreased the overhead of calling functions – calling interpolate_value is now 3x faster, but it's still much slower than a new function, even for small arrays.

For now, I'm leaving the PR open, because I still need to add documentation and tests.

jrobsontull commented 1 week ago

Thanks for this @wojdyr. Do let me know if I can contribute to anything else. We make good use of this package at our company for working with volume data and look forward to new features!