Optimize Bandstructure for access

pablosanjose / Quantica.jl

Simulation of quantum systems on a lattice

Other

70 stars 8 forks source link

Optimize Bandstructure for access #122

Closed pablosanjose closed 3 years ago

pablosanjose commented 4 years ago

Followup of https://github.com/pablosanjose/Quantica.jl/issues/120#issuecomment-714619981

A D-dimensional Bandstructure is currently just a collection of meshes (bands) in D+1 dimensions (k+energy vertices), a matrix of states for each vertex (flattened into a vector) and a vector of simplex pointers into vertices.

struct Mesh{D,T<:Number,V<:AbstractArray{SVector{D,T}}} <: AbstractMesh{D}   # D is dimension of parameter space
    vertices::V                         # Iterable vertex container with SVector{D,T} eltype
    adjmat::SparseMatrixCSC{Bool,Int}   # Undirected graph: both dest > src and dest < src
end

struct Band{D,M,A<:AbstractVector{M},MD<:Mesh{D},S<:AbstractArray{<:NTuple}}
    mesh::MD        # Mesh with missing vertices removed
    simplices::S    # Tuples of indices of mesh vertices that define mesh simplices
    states::A       # Must be resizeable container to build & refine band
    dimstates::Int  # Needed to extract the state at a given vertex from vector `states`
end

struct Bandstructure{D,M,B<:Band{D,M}}   # D is dimension of ε + parameter space
    bands::Vector{B}
end

This has several problems. In particular, simplices do not keep track of their cartesian indices on the base k-space mesh, so finding a subset of simplices around a given momentum becomes costly (we don't have higher-dimensional interval trees in Julia yet).

In this issue I want to consider a different approach to store the bandstructure that (1) solves the above access problem, while (2) still allowing efficient iteration over all simplices.

A third consideration to keep in mind is how to solve the problem of Dirac points. A Dirac point is a 2D degenerate subspace that has no natural basis, since velocity eigenstates depend on direction. Hence, the per-vertex codiagonalization we now do cannot yield simplices all around the Dirac point (half of them are always missing). A solution considered in the past is point-splitting: duplicate the Dirac point in the mesh, and choose a different basis in each copy. An alternative solution, incorporated in the present proposal, is to associate eigenstates to simplices, instead of vertices. To avoid the overhead of keeping multiple copies of eigenstates (since simplices share vertices), we resort here to storing state views inside simplices. Since we can store any view we want, we just need to store two different state bases for a Dirac point, and keep views to them in the relevant simplices.

So, a sketch of the proposed alternative implementation would be

struct CuboidMesh{D,TE}  # marching tetrahedra mesh of a D-dimensional cuboid in (ϕs...) parameter space
    axesticks::NTuple{D,Vector{TE}} # axes tick points of the form ([ϕ1...], [ϕ2...], ... [ϕD...])
end

struct Simplex{D,TE,TS,S<:SubArray{TS,1}}  # a single simplex with D+1 vertices (energy eigenvalues vals, and eigenstates vecs)
    vals::Tuple{TE, Vararg{TE,D}}   # D+1 vertices for simplices with a D-dimensional base space
    vecs::Tuple{S, Vararg{S,D}}     # vecs is a tuple of views into `states` below, or transformed copies thereof for degenerate vals
end

struct Bandstructure{D,TE,TS,SI<:Simplex{D,TE,TS}}
    base::CuboidMesh{D,TE}
    energies::Array{Vector{TE},D}   # This stores the eigenvalue output of diagonalize itself, no copies
    states::Array{Matrix{TS},D}     # This stores the eigenvector output of diagonalize itself, no copies
    simplices::Array{Matrix{SI}, D} # a column of simplices on top of each base simplex
    connectivity::Matrix{Int}
end

The reason for the Matrix eltype of simplices above is that for each "minicuboid" in the D-dimensional base mesh there is a fixed number N = D! of marching tetrahedra simplices inside, and for each of these we store a column of N´ simplices that form the lifted (bandstructure) mesh. This N x N´ would form the Matrix. It would be worthwhile to think if this could be improved further. For example not all N base simplices need have the same number N´ of simplices in the column above them (since the number of eigenvalues on each vertex need not be the same in general).

Perhaps a better way would be to have simplices::Array{Vector{SI}, D+1}, with the extra dimension spanning the N=D! simplices in each minicubiod. But then D+1 should be an extra parameter of the type, since we don't have computed type parameters in Julia yet.

pablosanjose commented 4 years ago

Some more thought about the simplices field.

There are many situations in which only the momenta and/or energies of a bandstructure need to be processed, e.g. for global DOS or Fermi surface computations. To optimize access in such cases, it is unfortunate to have simplices not be a contiguous array in memory (array of bit types). We could modify simplices to be simplices::S where S<:SimplexSet{D+1,TE,TS}. Then

struct SimplexSet{D´,TE,TS,S<:SubArray{TS,1}} 
    vals::Vector{NTuple{D´,TE}}
    vecs::Vector{NTuple{D´,S}}
    ptrs::Array{UnitRange{Int},D´}
end

Here D´ = D+1 is the number of vertices per simplex, and also the dimension of the simplex set (base dimension+1, because there is an extra index N=D! to get the simplex in a minicuboid).

To get to a certain simplex from (ϕs...), we first work out the D+1-dimensional simplex_index. Then simplices.ptrs[simplex_indices...] would give you the range r = imin:imax such that simplices.vals[r] would yield the vertex energies of the column simplices, and simplices.vecs[r] the corresponding views into the vertex states.

The reason to do things this way, and not have, say, a vals::Array{Vector{NTuple{D´,TE}},D´}, is that in the latter we cannot exploit simd to quickly scan through vals in situations where we don't select a particular simplex column. In other words, vals::Array{Vector{NTuple{D´,TE}},D´} is not contiguous in memory (it requires indirections to access), while Vector{NTuple{D´,TE}} is. We pay the price of storing ptrs, which will not be so frequencly used, but the size is not so huge (note that vals is typically much larger, by a factor approximately equal to the number of eigenvalues per base vertex -- assuming all eigenvalues form simplices). Note also that vecs is never a vector of bitstypes in this implementation, but of subarrays. This produces slow access due to indirections. We could consider doing copies of the views, but that does incur in a sizeable memory overhead, since states are large and would be repeated by a factor D+1.

pablosanjose commented 4 years ago

One more simplification: there is no need to wrap simplices in a SimplexSet struct, since its fields and those of the parent Bandstructure are tightly coupled

struct Bandstructure{D,T,S<:SubArray{1,Complex{T}},D´}  # Here D´=D+1
    base::CuboidMesh{D,T}
    energies::Array{Vector{T},D}        # This stores the eigenvalue output of diagonalize itself, no copies
    states::Array{Matrix{Complex{T}},D} # This stores the eigenvector output of diagonalize itself, no copies
    simplices::Vector{NTuple{D´,Int}}   # indices of flattened energies/states that form each simplex
    senergies::Vector{NTuple{D´,T}}     # vertex energies for each simplex
    sstates::Vector{NTuple{D´,S}}       # views of vertex states for each simplex
    spointers::Array{UnitRange{Int},D´} # for each base simplex (D´-dimensional tensor), pointers into simplices/senergies/sstates forming columns
end

Usage:

The connectivity matrix is simply reshape(reinterpret(T,bs.simplices),D´,:) -- no copy needed
The global density of states at a given energy can be obtained efficiently with an IntervalTree on extrema.(senergies)
The local density of states requires also sstates, but indexing into it is fast once the relevant simplices are identified through the IntervalTree

The flat array of vertices can be constructed with

vertices(bs) = [vcat(bs.base[n], ϵ) for n in eachindex(base) for ϵ in bs.energies[n]]

One possible alternative would be to store energies and states as flat Vectors. However, we would then have to store a vpointers of UnitRanges, analogous to spointers but for vertices, to be able to connect (ϕs...) and energies/states. Since fast iteration over vertices is not likely to be needed (unlike for simplices), this seems unnecessary.

Note: sstates stores views into eigenstate matrices. However, these need not be views of the matrices in states. In fact, for degenerate eigenstates one would rotate a copy of the subspace states as required by the simplex connection in question, and link that rotated copy into sstates with a view. The copy itself will not be garbage-collected, since a view into it remains accessible.

pablosanjose commented 4 years ago

To finish filling in implementation details: how do we go from a (ϕs...) to the pointer range of a column of simplices?

The first step is to find the n = (n1, ..., nD), which is easy by searching each ϕi within base.axesticks[i].
The second step is to normalize (ϕs...) within the minicuboid where it falls. We obtain x = SVector(x1, ...., xD) with 0<=xi<=1.
Different simplices in the minicuboid are defined by a certain 0<=x_p1<=x_p2<=...<=x_pD<=1. The indices pi can be computed with p = sortperm(x). There are factorial(D) possible ps.
Given p, the vertices of the simplex within the minibox corners are constructed as v_0 = (0,0,0,...), v_n = v_(n-1) + unitvector(p_n), where unitvector(n) is the unit vector along axis n. [This is not required to obtain the pointer range]
Given p, the index of a simplex amongst the D! simplices in the minicuboid is fastest to comptute using a Base.ImmutableDict,
```
d = Base.ImmutableDict((SVector{D}.(permutations(1:D)) .=> 1:factorial(D))...)
```
This allows fast generation of simplex index with idx = d[p]. We should perhaps construct and bundle d inside Bandstructure. Unfortunately ImmutableDict is unexported in Base.
With n and idx we can access the needed pointer range to the column of simplices as bs.spointers[n..., idx]

EDIT: it turns out to be faster to do naive linear lookup on the tuple of permutations p than indexing into an ImmutableDict, at least while constructing the tuple remains possible (i.e. until D=5, a 120-element tuple).

pablosanjose commented 3 years ago

A potential problem: when given a specific simplex pointer (e.g. when computing the DOS at a given energy) there is no easy way to obtain its base momenta. With the above design we can only go from base momenta to columns of simplices, not the other way round.

Possible solutions: 1- Store momenta in senergies::Vector{NTuple{D´,SVector{D´,T}}, which would then be called svertices 2- Or perhaps have a separate vector sbase::Vector{NTuple{D´,SVector{D,T}} with the momenta for each simplex. 3- Alternatively, store a basepointers::Vector{NTuple{D,Int}} of the same length as senergies that gives the linear indices of the base vertices for each simplex vertices

There doesn't seem to be any advantage to option 3, memory wise, and it adds an indirection, so it might be better to go with 1 or 2.

pablosanjose commented 3 years ago

Ah, just realized that we also need an adjacency matrix of Bandstructure vertices for some operations, like plotting a wireframe. I also realized, thinking about the issue of extracting the simplices, that it naturally leads to a classification of simplices into bands (more on that in another issue), which is a very nice thing to keep, and which the present exercise has also dropped. All in all, I'm beginning to realize that the current Bandstructure design is not that bad, and just needs to be extended to allow O(1) simplex access.

The minimal changes to enable fast access would be

Reintroduce a base mesh in Bandstructure that was removed in #119. But we make it of type BaseMesh (renamed from CuboidMesh), that allows O(1) indexing
Keep bands::Vector{Band}, where Band contains (1) a mesh::BandMesh (vertices + adjacencies), and (2) a simplices::BandSimplices
Now, Simplices should contain a verts::Vector{NTuple{D´,SVector{D´,T}}} of vertex groups, a vecs::Vector{NTuple{D´,SubArray}} of eigenstate groups, and a ptrs::Array{UnitRange{Int},D´} that maps base simplices to simplices in the set. For this, the Vectors need to be ordered by shared base simplex. In otherwords issorted(verts)==true
BandMesh currently has verts and adjmat. We include simpinds::Vector{NTuple{D´,Int}} of verts indices that form simplices. simpinds shouldn't go in Simplices because it refers to mesh.verts

This would then read:

struct BaseMesh{D,V} <: AbstractMesh{D}  # Was CuboidMesh, implements AbstractMesh API (edges, edgedest, vertices, simplices...)
    axesticks::NTuple{D,V}  # Here V could be any container, although tuples are fastest for indexing
end

struct BandMesh{D´,T<:Number} <: AbstractMesh{D´}  # Was Mesh in master, D´ = D+1 only
    verts::Vector{SVector{D´,T}}
    adjmat::SparseMatrixCSC{Bool,Int}
    simpinds::Vector{NTuple{D´,Int}}    # Needed for Makie plotting. Could be computed each time
end

struct BandSimplices{D´,T,S<:SubArray}
    sverts::Vector{NTuple{D´,SVector{D´,T}}}
    svecs::Vector{NTuple{D´,S}}
    sptrs::Array{UnitRange{Int},D´}  # range of indices of verts and vecs for each base simplex
end

struct Band{D´,T,M<:BandMesh{D´},S<:BandSimplices{D´,T}}
    mesh::M
    simplices::S
end

struct Bandstructure{D,T,M<:BaseMesh{D},D´,B<:Band{D´,T}}   # D is dimension of base space, D´ = D+1
    base::M           # We don't allow a generica AbstractMesh here, only BaseMesh to enable fast access
    bands::Vector{B}  # This remains the same
end

With this organization:

simplex reductions (e.g. DOS, LDOS, Green's functions) will use band.simplices for each band
indexing will use bs.base and band.simplices
plotting will use the band.mesh.

Note that all operations remain band-resolved, with bands summed over as required

pablosanjose commented 3 years ago

The above can be refined by fusing the simplices of all bands into a single struct. This potentially allows for much faster searches and DOS computations, since a single IntervalTree needs to be constructed and searched. It also saves space, since we only keep one ptrs::Array{..., D}. The price is that you cannot select bands when computing DOS. However you can always do selective DOS like LDOS, using the states (which is what makes physical sense, since the partition into bands is somewhat arbitrary in general).

The following is also a bit more elegant because it has less overlap between simpinds in BandMesh and Simplices. The former is not sorted, so it is better for plotting. The latter needs to be sorted in columns following the base mesh, so it is better for accessing and computing stuff.

struct BaseMesh{D,V} <: AbstractMesh{D}  # Was CuboidMesh, implements AbstractMesh API (edges, edgedest, vertices, simplices...)
    axesticks::NTuple{D,V}  # Here V could be any container, although tuples are fastest for indexing
end

struct BandMesh{D´,T<:Number} <: AbstractMesh{D´}  # Was Mesh in master, D´ = D+1 only
    verts::Vector{SVector{D´,T}}
    adjmat::SparseMatrixCSC{Bool,Int}
    simpinds::Vector{NTuple{D´,Int}}    # Needed for Makie plotting. Could be computed each time
end

struct Simplices{D´,T,S<:SubArray}
    sverts::Vector{NTuple{D´,SVector{D´,T}}}
    svecs::Vector{NTuple{D´,S}}
    sptrs::Array{UnitRange{Int},D´}  # range of indices of verts and vecs for each base simplex
end

struct Bandstructure{D,T,M<:BaseMesh{D},D´,B<:BandMesh{D´,T},S<:Simplices{D´,T}}   # D is dimension of base space, D´ = D+1
    base::M           # We don't allow a generica AbstractMesh here, only BaseMesh to enable fast access
    bands::Vector{B}
    simplices::S
end

With this alternative organization:

simplex reductions (e.g. DOS, LDOS, Green's functions) would use simplices
indexing will use base and simplices, and would thus be only energy-resolved
plotting will use the bands, and would thus be band-resolved
Fermi surface computation will use bands, and would also be band-resolved