Broombridge Schema to support open-shell molecules

Is your feature request related to a problem? Please describe. Currently Broombridge schema only support the restricted wave function reference, which limits its application to closed-shell molecules only. We should consider upgrade the schema to support the unrestricted/spin-orbital wave function reference for open-shell molecules.

Describe the solution you'd like To general support open-shell system, first we need to have entries of spin multiplicity to tell how much alpha electrons and beta electrons in the systems (as they are no longer paired). Then, we need to have individual set of alpha orbitals and beta orbitals. That also says, we need to divide the whole two-electron Hamiltonian into different spin-blocks. Also, in the case of spin-orbitals, the integral value can now be complex number rather than real number.

Thanks for opening this, @dgwvfxn! Exciting to consider extending Broombridge to allow representing fermionic Hamiltonians for open-shell molecules as well. From our offline discussion, I'd suggest a couple different functional requirements for how to extend the schema:

Allow specifying ↑/↓ or α/β as part of indices in two-electron integral terms.
Allow specifying "off-diagonal" terms between different spins.
Specify n↑ and n↓ instead of just n_electrons.
Possibly allow specifying restrictions on allowed Hamiltonians (e.g.: closed-shell) that can be checked at schema validation or deserialization time.

I could imagine either adding ↑ and ↓ to indices directly (e.g.: [[1, 'a'], [1, 'a'], [1, 'b'], [1, 'a']]) or having distinct keys to factor out common parts of indices as per your offline suggestion:

two_electron_integrals:
    # Use a different format to denote that values are grouped by common
    # indices.
    format: urf-sparse 
    index_convention: mulliken
    units: hartree
    values:
        aaaa:
        - [1, 1, 1, 1, 1.6586341297]
        # ...
        aabb:
        - [1, 1, 1, 1, 1.6586341297]
        # ...

I am in favor of having distinct keys to factor out common parts (aaaa, aabb, bbbb, etc. ). It is rather easy to print these blocks out and easier to inspect and parse. It also has added benefits as I describe below.
As suggested, I would add keywords for RHF/closed-shell vs ROHF/UHF/open-shell identification of the integrals. Currently, when we print out the closed-shell integrals from NWChem, we print a subset of the spin-orbital integrals because it is equivalent to the orbital form. We can choose to always print the full set of common parts (aaaa, aabb, etc), but with such added keywords there are additional benefits and checks one can perform. For example, you can choose to just read the aabb terms in closed-shell cases as we already do and operate with just those terms.
For the formatting, there ought to be keywords for 'full' vs. 'compact' notation of the integrals. We know that the two-electron integrals have an 8-fold symmetry [(ij|kl) = (kl|ij) = (ji|lk) = (lk|ji) = (ji|kl) = (lk|ij) = (ij|lk) = (kl|ji)], so we only format the unique term (ij|kl) in the YAML files, which is what I mean by 'compact' notation. However, certain classes of Hamiltonians do not have this 8-fold symmetry. So it would be trustworthy to print the 'full' form (all terms) of that Hamiltonian in that case.

Here is an example file for the proposed update. For all cases (rhf/uhf/ducc, etc) the spin blocks should be defined under the one/two-electron integrals.

"$schema": https://raw.githubusercontent.com/Microsoft/Quantum/main/Chemistry/Schema/broombridge-0.3.schema.json

bibliography:
- {url: 'https://www.nwchem-sw.org'}
format: {version: '0.3'}
generator: {source: nwchem, version: '6.8'}
problem_description:
- basis_set: {name: sto-3g, type: gaussian}
  coulomb_repulsion: {units: hartree, value: 0.2645886}
  energy_offset: {units: hartree, value: 0.0}
  fci_energy: {lower: 0.0, units: hartree, upper: 0.0, value: 0.0}
  geometry:
    atoms:
    - coords: [1.0, 0.0, 0.0]
      name: H
    - coords: [-1.0, 0.0, 0.0]
      name: H
    coordinate_system: cartesian
    symmetry: c1
    units: angstrom
    hamiltonian: # if rhf, need aa,aabb block, if uhf, need aa, bb, aaaa, bbbb, aabb block.
    one_electron_integrals:
      format: full # full or sparse(compact), this corresponds to the symmetry discussion. 
      units: hartree
      values:
        aa:
          - [1, 1, -0.778922036]
          - [2, 2, -0.670266672]
        bb:
          - [1, 1, -0.778922036]
          - [2, 2, -0.670266672]
    two_electron_integrals:
      format: full 
      index_convention: mulliken
      units: hartree
      values:
        aaaa:
          - [1,1,1,1,0.50946281]
          - [1,1,2,2,0.51920126]
          - [2,1,2,1,0.25913847]
          - [2,2,1,1,0.51920126]
          - [2,2,2,2,0.53466412]
        bbbb:
          - [1,1,1,1,0.50946281]
          - [1,1,2,2,0.51920126]
          - [2,1,2,1,0.25913847]
          - [2,2,1,1,0.51920126]
          - [2,2,2,2,0.53466412]
        aabb:
          - [1,1,1,1,0.50946281]
          - [1,1,2,2,0.51920126]
          - [2,1,2,1,0.25913847]
          - [2,2,1,1,0.51920126]
          - [2,2,2,2,0.53466412]
  initial_state_suggestions:
  - energy: {units: hartree, value: -1.13727}
    method: sparse_multi_configurational
    label: '|G>'
    superposition:
      - [1.0, (1a)+, (1b)+, '|vacuum>']
  - label: "UCCSD |G>"
    method: unitary_coupled_cluster
    cluster_operator:             # Initial state that cluster operator is applied to.
        reference_state: [1.0, (1a)+, (1b)+, '|vacuum>']             # A one-body cluster term is t^{q}_{p} a^\dag_q a_p             # A one-body unitary cluster term is t^{q}_{p}(a^\dag_q a_p- a^\dag_p a_q)
        one_body_amplitudes:             # t^{q}_{p} p q 
            - [-1.97094587e-06, "(2a)+", "(1a)"]
            - [1.52745368e-07, "(2b)+", "(1b)"]
        two_body_amplitudes:            # t^{pq}_{rs} p q r s           # If this is a PQQR term, the middle two indices must coincide.
            - [1.13070239e-01, "(2a)+", "(2b)+", "(1a)", "(1b)"]
  metadata: {molecule_name: H2}
  n_electrons_alpha: 2 # now we distinguish alpha and beta electron, apha >= beta
  n_electrons_beta: 0
  wavefunction_type: "uhf" # other options: rhf, rohf, ghf, ducc, etc.
  n_orbitals: 2 # Number of alpha or beta orbitals, this value bounds the Hamiltonian index. 
  scf_energy: {units: hartree, value: -0.9245373192292724}
  scf_energy_offset: {units: hartree, value: 0.0}

When enumerating the alpha and beta spin orbitals, do we want the indices for alpha and beta to be separate or combined. Do we want: 1) alpha spin orbitals to have their set of labels (1,2,...,[# occupied alpha],...,[total # alpha spin orbitals]) and beta to have its own set of indices (1,2,...,[# occupied alpha],...,[total # alpha spin orbitals]). Since we decided to settle on splitting the alpha/beta subsets in the broombridge, then the labels are inferred from the particular group they are under. For example, in the aaaa block, [2,1,2,1] this means 2-alpha,1-alpha,2-alpha,1-alpha and therefore the pair of "2" indices are referring to the same spin-orbital, while in the aabb block, [2,1,2,1] this means 2-alpha,1-alpha,2-beta,1-beta and the pair of "2" indices are referring to different spin-orbital. 2) the indices to be continuous, i.e. (1,2,...,[# occupied alpha],[# occupied alpha+1],...,[[# occupied alpha & beta],[[# occupied alpha & beta+1],...,[# occupied + # virtual alpha],[# occupied + # virtual alpha+1],...[# occupied + # virtual alpha+beta]). If you consider the triplet state of a minimum basis set of LiH calculation with 3 alpha occupied, 1 beta occupied, 3 alpha virtual, and 5 beta virtual orbitals, then indices 1-3 would be alpha occupied, 4 would be beta occupied, 5-7 would be alpha virtual, and 8-12 would be beta virtual.

Personally, I lean toward option 1 since it is easier to visually inspect. However, if it is more convenient to implement option 2 for use in parts of QDK, then let me know.

microsoft / QuantumLibraries

Broombridge Schema to support open-shell molecules #432