Differentiate between isolated Cavities, Tunnels, and Pockets

jmaglic commented 3 years ago

Current status

As of now, all are treated the same and referred to as "cavities".

Issue

It may be of interest being able to distinguish between isolated cavities, tunnels, and pockets. Isolated cavities are cavities that are entirely located inside the structure. Tunnels are cavities with at least two openings to the "outside". Pockets cavities with a single opening to the "outside", for instance depressions on the surface of a macromolecule that are large enough to contain the probe.

Proposed change

To differentiate between these kinds of cavities, it's necessary to define an "outside". In two probe mode this is fairly easy, as the outside can simply be defined as the space accessible by the large probe (orange and yellow).

To differentiate cavities, it's possible to determine whether the small probe core type voxels (blue) of a given cavity ever come into contact with the outside. If they don't come into contact, i.e. the core type voxels never neighbour a large probe voxel, then the cavity is an isolated cavity. If they do come into contact, then it is either a pocket or a cavity. It would be necessary to implement a way to count the number of interfaces between cavity and outside in order to distinguish between pocket and tunnel, but I believe this may be possible by slightly modifying the current flood fill algorithm, used during the "Identify cavities" step.

I am not entirely sure whether this distinction would make sense in single probe mode, as the "outside" is not differentiated from the pockets and tunnels.

Improvement

The user could be informed about the type of cavity, which could help identify targeted cavities without having to look at each one.

jmaglic commented 3 years ago

I have been thinking about this feature, and I now think that regarding single-probe mode, it may be best to identify and remove the "cavity" that touches the outside of the space when not analysing the unit cell. My reasoning is that that volume value has no real physical significance. Its value is largely determined by the imaginary cell that we put the molecule in during calculation. Also, the value doesn't obey depth invariance, i.e. it's not constant with respect to the octree depth.

As for volumes that touch the cell edges when in unit cell analysis mode, I think it would be best to evaluate assign them one to one of the cavity type mentioned (real cavity, tunnel, pocket) but also introduce another type "pore". A pore would be defined as a probe accessible volume that is continues with regards to the repeating unit cell.

Evaluating these unit cell volumes would require evaluating whether probe volumes "touch" other probe volumes on the opposite side of the unit cell. As I haven't spent too much time with the unit cell section of the code, I'm not sure how easy this is currently.

rlavendomme commented 3 years ago

I have been thinking about this feature, and I now think that regarding single-probe mode, it may be best to identify and remove the "cavity" that touches the outside of the space when not analysing the unit cell. My reasoning is that that volume value has no real physical significance. Its value is largely determined by the imaginary cell that we put the molecule in during calculation. Also, the value doesn't obey depth invariance, i.e. it's not constant with respect to the octree depth.

I agree that this volume has no meaning and should be removed but the line should stay because the surface area has meaning and the corresponding surface area is impossible to get from two-probe mode. So the volume values should be replaced with "outside" if using single probe mode AND not analyzing unit cell. This question should be discussed in it own separate issue.

As for volumes that touch the cell edges when in unit cell analysis mode, I think it would be best to evaluate assign them one to one of the cavity type mentioned (real cavity, tunnel, pocket) but also introduce another type "pore". A pore would be defined as a probe accessible volume that is continues with regards to the repeating unit cell.

I agree that this would be great to identify continuous pores within a porous material.

Evaluating these unit cell volumes would require evaluating whether probe volumes "touch" other probe volumes on the opposite side of the unit cell. As I haven't spent too much time with the unit cell section of the code, I'm not sure how easy this is currently.

I believe this would be relatively simple for orthogonal unit cells because two opposite sides connect in a straight line but it could be a bit more complicated for non orthogonal unit cells where an offset needs to be taken into account. The offset is not too difficult to calculate. Since you made the flood fill function, do you think it would be simple to create loops on the boundaries of the unit cell?

If we make such loops for one function, we could also directly analyze the initial unit cell shape that might be any parallelepiped with such boundary loops. It would be faster than the current algorithm but would require a lot of work to adapt the atom binary tree and other functions.

jmaglic commented 3 years ago

You mean loop the flood fill at the unit cell boundaries? I believe that should be possible, but I'll have to look into how the unit cell analysis is actually implemented.

rlavendomme commented 3 years ago

The unit cell analysis simply generates an extended cell so any atom of nearby cell that can affect the probes in the starting unit cell are added to the structure. When calculating volumes and surface areas, only the voxels inside the unit cell are checked so the voxel indexes for the boundaries of the unit cell are already calculated and present in the program.

jmaglic commented 3 years ago

20210703_cavity-types I wanted to add this picture for later reference.

Red: Atom type
Green: Small probe shell type
Blue: Small probe core type
Yellow: Large probe core/shell type
Inaccessible type has been omitted

Here the different cavities can be characterised by the number of entrances and the types of interfaces. An entrance is any opening in the structure where the small probe can protrude or pass through. Each entrance forms an interface which basically just describes what's "on the other side".

For instance cavity G has two entrances. The left one forms an interface between small probe shell/core type voxels and large probe type voxels ("core/outside" for short).

We could characterise each of the cavities above but at the cost of clarity. So the current state of affairs is that we use three categories to group the cavities. The groups are entirely characterised by the number of core/outside interfaces.

Isolated cavities: 0 core/outside interfaces (A,C,D,E,H,K)
Pockets: 1 core/outside interfaces (B,G,J)
Tunnels: 2 core/outside interfaces (F)

molovol / MoloVol