Motivation

In the properties framework, we recently added classes like the ElectronicIntegrals and VibrationalIntegrals which are essentially coefficient containers, tailored to two specific use cases. As was already brought up in #460, the former of these is tailored so extensively, rendering it unusable in other settings like for example the QuadraticHamiltonian.

Proposal

Thus, I would like to suggest an implementation of a generalized coefficient container, which can be used for arbitrary n-body coefficients and, furthermore, is agnostic w.r.t. the kind of operator which can later be constructed from it (meaning it can be used for electronic and vibrational coefficients alike).

Pseudo-code

I would propose a class similar to the following:

from qiskit.opflow.mixins import StarAlgebraMixin
from qiskit.quantum_info.operators.mixins import TolerancesMixin

class PolynomialTensor(StarAlgebraMixin, TolerancesMixin):

    def __init__(self, data: Dict[str, np.ndarray]):
        self._data = data

    # TODO: implement mathematical operations supported by StarAlgebraMixin

Structure

They key detail here, would be the structure of _data. I propose, that it be structured as following:

each key is a string with the following properties:
- the length of the string determines the number of dimensions of the matrix stored as its value
- the content of the string does not affect any behavior in this class itself, but indicates the kind of operator which gets encoded along the corresponding axis of the stored matrix (see examples below)
the values in this dictionary, are simply numpy arrays whose dimension is given by the string length (previous point)

In the future, adding support for sparse matrix structures will be trivial by simply supporting the union type in this container. If the stack is built on this PolynomialTensor class without any assumptions on the underlying matrix format (dense or sparse), this will immediately enable the entire stack to benefit from sparse matrix support :+1:

Operator construction

The key use case of this PolynomialTensor, is efficient storage and mathematical manipulation of coefficients. In many cases, we obtain such coefficients in a matrix format compared to the ravel-style operator representation which we have in our SecondQuantizedOp classes.

By this I mean that in those classes we store dictionaries of string labels mapping to single coefficients (i.e. a raveled matrix). More on this in issue (TBD; writing that next)

Implementation

The implementation of the PolynomialTensor -> SecondQuantizedOp translation, will be left to each of the operator classes themselves:

FermionicOp.from_polynomial_tensor(tensor): #666

[CLICK ME] Data structure proposal for fermionic case

### `FermionicOp` Currently, coefficients that will later become a `FermionicOp` are stored in most cases in the `ElectronicIntegrals`. Those are tailored to the electronic structure problems allowing some memory reduction in scenarios of identical coefficients for spin-up and -down orbitals, rendering them not generally applicable. Instead, I propose a structure of `data` as follows: ``` { "+-": [ [1, 2], [3, 4] ], "++--": [ [[ [1, 2], [3, 4] ], [ [5, 6], [7, 8] ]], [[ [9, 10], [11, 12] ], [ [13, 14], [15, 16] ]] ] } ``` Here `+-` stores a one-body term where the first axis of the stored matrix will be mapped to `+` operators, and the second axis to `-`. I.e. this will result in a `FermionicOp`: ``` 1.0 * "+_0 -_0" + 2.0 * "+_0 -_1" + 3.0 * "+_1 -_0" + 4.0 * "+_1 -_1" ``` Accordingly, `++--` indicates a two-body term, and, even more importantly, unambiguously stores the two-body terms in the setting of electronic structure problems, in the physicist notation: $$ g_{pqrs} a^{\dagger}_p a^{\dagger}_q a_r a_s $$ For brevity, I will not expand the 16 coefficients into the operator but will only show a few examples: - `data["++--"][0, 0, 0, 0] -> 1.0 * "+_0 +_0 -_0 -_0"` - `data["++--"][0, 1, 0, 1] -> 6.0 * "+_0 +_1 -_0 -_1"` - `data["++--"][1, 0, 0, 0] -> 9.0 * "+_1 +_0 -_0 -_0"` For the specific case of electronic structure problems, we can implement a specialized variant of the `PolynomialTensor` which will allow simplified construction from chemistry-notation two body integrals (using the utilities developed in #520), can implement memory-reduction methods to exploit spin symmetries, and (in the future) could also exploit further symmetry information in chemical systems (like 8-fold symmetry of AO ERIs, for example).

SpinOp.from_polynomial_tensor(tensor): #667

[CLICK ME] Data structure proposal for spin case

### `SpinOp` This class could also be used to store coefficients for the construction of `SpinOp`s. ``` { "XYZ": [ [ [1, 2], [3, 4] ], [ [5, 6], [7, 8] ] ], "XX": [ [1, 2], [3, 4] ], } ``` This would result in terms like this: - `data["XYZ"][0, 0, 0] -> 1.0 * "X_0 Y_0 Z_0"` - `data["XX"][0, 1] -> 2.0 * "X_0^2"` This is not a construct we currently have in Qiskit Nature, but it may prove to be useful in the future. In any case, supporting it in the `PolynomialTensor` is trivial.

VibrationalOp.from_polynomial_tensor(tensor): #668

[CLICK ME] Data structure proposal for vibrational case

### `VibrationalOp` Finally, the vibrational structure use case is a bit more special. Let me first explain briefly, what the vibrational operator does: - it has `m` modes - each mode can have an individual number of modals, $$ n_i \forall i \in m $$ - this means, the total length of the `VibrationalOp` is given by $$ \sum_{i \in m} n_m $$ - the splitting of each mode into its modals is determined by the basis (e.g. `HarmonicBasis`) which we use to map to the second-quantization space - thus, the coefficients which we extract from a classical code (e.g. via the `GaussianLogResult`) actually only index the *modes* and, thus, have no concept of _modals_ What this means, is that the coefficients stored in a `PolynomialTensor` must be combined with a basis in order to produce a full `VibrationalOp`. We see the same in the current stack, because the `VibrationalEnergy` requires a basis in combination with the `VibrationalIntegrals`. That said, what should we actually store in a `PolynomialTensor` in the vibrational scenario? I would argue, that the splitting into n-body terms which we do in the `VibrationalIntegrals` is not very suitable here, because of the following: ``` 1-Body Terms: (2, 2) = 352.3005875 (-2, -2) = -352.3005875 (1, 1) = 631.6153975 (-1, -1) = -631.6153975 (4, 4) = 115.653915 (-4, -4) = -115.653915 (3, 3) = 115.653915 (-3, -3) = -115.653915 (2, 2, 2) = -15.341901966295344 (2, 2, 2, 2) = 0.4207357291666667 (1, 1, 1, 1) = 1.6122932291666665 (4, 4, 4, 4) = 2.2973803125 (3, 3, 3, 3) = 2.2973803125 ``` A simple 1-body term, can have very different lengths. The same holds for n-body terms in general. Instead, I suggest we store the coefficients in the same format as done in the `GaussianLogResult`, where we split them into quadratic, cubic and quartic force terms. These all have length 2, 3 and 4, respectively. The kinetic terms (negative indices in the example above) can be inferred from the quadratic terms by a simple negation of the coefficient, so they do not need to be stored separately. This results in a structure like this: ``` { "??": [[...]], # quadratic "???": [[[...]]], # cubic "????": [[[[...]]]], # quartic } ``` The strings here do not carry useful information in their characters, as only their length is relevant. I am open to suggestions on what to do here.

mrossinek commented 2 years ago

Some more expected features:

configurability of float vs complex as the data type of the internal numpy arrays
a "relabeling" mechanism as suggested in #542 (maybe something along the lines of https://stackoverflow.com/a/10937086?)

kevinsung commented 2 years ago

I don't think we should bother to implement StarAlgebraMixin since the user can just convert to a SecondQuantizedOp, which is more suitable for the star algebra operations and already has them implemented.

qiskit-community / qiskit-nature

Add a `PolynomialTensor` class #665

What should we add?