theochem / PyCI

A flexible ab-initio quantum chemistry library for (parameterized) configuration interaction calculations.
http://pyci.qcdevs.org/
GNU General Public License v3.0
13 stars 9 forks source link

Automatic Determination of Number of Orbital Nodes (for GKCI) #71

Open PaulWAyers opened 1 month ago

PaulWAyers commented 1 month ago

The idea is to leverage QTAIM technology to get the number of nodes in orbitals.

This is complicated (more than I thought) as pointed out by @FarnazH . So this is mostly storage of an ideas/discussions for future consideration.

Isosurface-Driven Approach

  1. Compute the orbital, $\phi_k(\mathbf{r})$ on a cubic grid. This can be done using gbasis, grid, and/or cugbasis.
  2. Construct all nodal isosurfaces using marching cubes, $\phi_k(\mathbf{r})=0$. (Alternatively, one could identify the "watershed points" in $|\phi_k(\mathbf{r})|^2$.)
  3. Let each nodal surface/watershed point be a vertex; if its neighbors are also nodal/watershed points connect them with an edge.
  4. Find the number of connected components of the graph. The number of connected components of the graph is the number of nodal surfaces. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csgraph.connected_components.html.

Note: Using sci-kit image, we immediately get vertices (verts) and edges (faces connects each vertex to three other vertices), making this pretty easy. It is easy to put this into coo format for the purpose of using scipy.sparse.csgraph.connected_components to identify the connected components of the graph. Each pair of elements in the faces is a nonzero element of the adjacency matrix, corresponding to a coo entry. So if a face has the three integer-indexed vertices $v_1,v_2,v_3$, then one has three coo entries, $v_1,v_2,1.0$, $v_1,v_3,1.0$, and $v_2,v_3,1.0$

@FarnazH pointed out that this is more complicated, because nodal surfaces can (and often do) interesect. Intersecting nodal surfaces are connected, however.

By identifying isosurfaces, we can identify closed isosurfaces (do not contain any boundary points) and also open isosurfaces (that stretch to infinity, or the boundary of the region). Closed isosurfaces always count; I do not think they can intersect except at a catastrophe point, where the number of nodal surfaces changes.

For the open isosurfaces, one stores the points at the boundary, and defines a new graph that contains only the boundary points. If this graph has just one cycle, then the open isosurface does not interesect any others. If two open isosurfaces intersect, it seems there are twelve cycles. For three or more interesecting open surfaces it depends on the topology of the intersection.

One can also characterize the number of regions. The number of regions is obtained by:

  1. Compute the orbital, $\phi_k(\mathbf{r})$ on a cubic grid. This can be done using gbasis, grid, and/or cugbasis.
  2. Let each point in the cube be a vertex; $v_i$, connect it only to its neighbor, $v_j$, if $\phi_k(\mathbf{r}_j)$ has the sign as $\phi_k(\mathbf{r}_i)$. This is a very sparse graph. Do not connect if the magnitude of the neighbor is very small; indeed, it is good to eliminate (not label) all points where the value of the orbital is sufficiently small to avoid accidentally joining, for example, the two distinct nodes of a 3d orbital (which "intersect" at a point).
  3. Find the number of connected components of the graph. The number of connected components of the graph is the number of regions surrounded by nodes.

In one dimension, the number of nodal regions is one greater than the number of nodes. In two dimensions, there is already the potential for intersecting nodes. So, for example, four nodal regions could correspond to four interesecting nodes (like a typical $3d$ orbital) or three nonintersecting nodes (like a $3p$ or $4s$ orbital). Three nodal surfaces can give anywhere from 4 nodal regions to 8 nodal regions, depending on how (and whether) the nodal surfaces intersect. For example, a typical $4d$ orbital has 8 nodal regions. Four nodal surfaces can give anywhere from 5 nodal regions to 16 nodal regions. In three dimensions, it should be similar: for $n$ nodes you can have anywhere between $n+1$ and $2^n$ nodal regions.

Trying to count them is hard. There is probably a formula in terms of the number of cycles at infinity, the number of connected nodal surfaces, and the number of nodal regions. It may be possible to make a complete taxonomy up to a relatively small number of nodes (~5), but beyond that one needs to use one of the asymptotic formulas that we have cooked up elsewhere. There would be better ways to do this, but it would require a detailed analysis of the way the nodal hyperplanes interested.

Old Bad Idea

I thought that by constructing the function, $d(\mathbf{r})$ that was the distance to the nearest nodal surface we would have a function that would have one maximum for every nodal region. This doesn't work because there could be dumbell-shaped regions, which have two maximum for $d(\mathbf{r})$.

  1. For each orbital, $\phi_k(\mathbf{r})$, compute it at a set of grid points, $\mathbf{r}_g$.
  2. For each grid point $\mathbf{r}_g$, find the closest grid point $\mathbf{r}_h$ such that $\phi_k(\mathbf{r}_g)\phi_k(\mathbf{r}_h) < 0$. This is the closest grid point for which the orbital $\phi_k(\mathbf{r})$ has a different sign and, if the grid is dense enough, approximates the distance of $\mathbf{r}_g$ to the nearest nodal surface of $\phi_k(\mathbf{r})$ . Label this distance $d_k(\mathbf{r}_g)$.
  3. We can then find the basins of the function $d_k(\mathbf{r})$; the number of basins is one greater than the number of nodal surfaces. If a cubic grid is used, the Henkelmann algorithm is very well suited. Otherwise, alternative methods (e.g., from ChemTools) are better. It may be convenient to exponentiate the distance, as normal QTAIM algorithms are well-suited to exponentially decaying functions. So something like $d_k(\mathbf{r}) e^{d_k(\mathbf{r})}$ might work well.

Notes: A brute strength implementation, with cost $\mathcal{O}(N_{grid}^2)$ is quite easy. However, using k-d trees one could make a more efficient implementation. One would set a threshhold $\sigma$, loop over all grid points within $\sigma$ of $\mathbf{r}_g$, and if $\sigma$ is not big enough, increase it.

PaulWAyers commented 1 month ago

Given that it is practical to compute the number of nodal regions but not the number of nodes, we could

  1. compute the number of nodal regions
  2. compute the number of nodes in a spherical/cubic system that has that number of nodal regions
  3. use this function $\text{number of nodes}(\text{number of nodal regions})$ as a proxy variable. Since a given number of nodes has many different number of nodal regions, one needs to use an average.
For spherical systems we have: # nodes # regions
0 1
1 2
2 (3+3 4+4 4+1 * 3z^2)/9
3 (4+3 6+4 8+6z^2+6 6+1z^3 4)/16
4 (5+3 8+4 12+9z^2+6 12+1 8z^3+8 * 8+5z^4)/25
- -

We would need to build a spline for this function and use its inverse to compute the number of nodes as a function of the number of regions.

For Cartesian systems, we have: # nodes # regions
0 1
1 2
2 (3 3 + 4 3)/10
3 (3 4 + 6 6 + 8)/11
- -

James pointed out that the formula for the cube (or any rectangular prism) is that the number of regions is for quantum numbers $n_x, n_y, n_z$ = $(n_x+1)(n_y+1)(n_z+1)$. Then one can fill in the rest of this table relatively simply (just looping through all the ways to add up 3 integers to form $n_x + n_y + n_z = \text{number of nodes} - 3$