Mathematical operations to extract all the performance juice from your hardware!
git clone https://github.com/lab-cosmo/mops
cd mops
# Builds the code and run all tests
tox
# Installs the Python package
pip install .
Some common motifs of vector, matrix and tensor operations that appear in science and engineering are planned to be implemented here for CPUs and GPUs. These include:
$$ Oi = \sum{j=1}^J Cj \prod{k=1}^K A{iP{jk}} $$
$A$ is a 2D array of floats, of size $I \times N_{A,2}$. It contains the individual factors in the monomials that make up the polynomial.
$C$ is a vector of float multipliers of size $J$. They represent the coefficients of each monomial in the polynomial, so that $J$ is the number of monomials in the polynomial.
$P$ is a 2D array of integers which represents the positions of the individual factors for each monomial in the second dimension of the $A$ array. In particular, the $k$-th factor of monomial $j$ will be found in the $P_{jk}$-th position of the second dimension of $A$.
$O$ is a dense 1D array of floats, which only contains a batch dimension of size $I$.
The calculation consists in a batched evaluation of homogeneous polynomials of degree $K$, where the monomials are given by $C[j] A[:, P_1[j, 1]] A[:, P_2[j, 2]] * \dots$, as follows:
for j in range(J):
O[:] += C[j] * A[:, P_1[j, 1]] * A[:, P_2[j, 2]] * ...
$$ O_{iPk^O} = \sum{k \in {k'|P^O_{k'}=P^O_k}} Ck A{iPk^A} B{iP_k^B} $$
$A$ and $B$ are 2D arrays of floats whose first dimension is a batch dimension that has the same size for both.
$C$ is a 1D array of floats which contains the weights of the products of elements of $A$ and $B$ to be accumulated.
$P^A$, $P^B$ are 1D arrays fo integers of the same size which contain the positions along the second dimensions of $A$ and $B$, respectively, of the factors that constitute the products.
$P^O$ is a 1D array of integers of the same length as $P^A$ and $P^B$ which contains the positions in the second dimension of the output tensor where the different products of $A$ and $B$ are accumulated.
$O$ is a 2D array of floats where the first dimension is a batch dimension (the same as in $A$ and $B$) and the second dimension contains the scattered products of $A$ and $B$.
The weighted products of $A$ and $B$ are accumulated into $O$ as follows:
for k in range(K):
O[:, P_O[k]] += C[k] * A[:, P_A[k]] * B[:, P_B[k]]
$$ O{ikl} = \sum{j=1}^J A{jk} B{jl} \delta_{iPj} \hspace{1cm} \mathrm{or} \hspace{1cm} O{ikl} = \sum{j \in {j'|P{j'}=i}} A{jk} B{jl} $$
$A$ is a dense matrix of floats, expected to be large in one dimension (size $J$), and smaller in the the other (size $K$).
$B$ is a dense matrix of floats, expected to be large in one dimension (size $J$), and smaller in the the other (size $L$).
$P$ is a large vector of integers (of size $J$) which maps the dimension $j$ of $A$ and $B$ into the dimension $i$ of $O$. In other words, it contains the position within $O$ where each $AB$ product needs to be summed.
$n_O$ is the size of the output array along its first dimension. It must be grater or equal than the larger element in $P$ plus one.
$O$ is a 3D array of floats of dimensions $I \times K \times L$, which contains the accumulated products of the elements of $A$ and $B$.
For each $j$, an outer product of $A[j, :]$ and $B[j, :]$ is calculated, and it is summed to $O[P[j], :, :]$:
for j in range(J):
O[P[j], :, :] += A[j, :, None] * B[j, None, :]
$$ O{ikl} = \sum{j \in {j'|P{j'}=i}} A{jk} B{jl} W{{PW_j}l} $$
for j in range(J):
O[PO[j], :, :] += A[j, :, None] * B[j, None, :] * W[PW[j], None, :]
$$ O_{i{m3}k} = \sum{e \in {e'|I{e'}=i}} R{ek} \sum{n \in {n'|M^3{n'}=m_3}} Cn A{e{Mn^1}} X{{J_e}{M_n^2}k} $$
for j in range(J):
for n in range(N):
O[PO1[e], PO2[n], :] += A[e, PA[n]] * B[e, :] * C[n] * W[PW1[e], PW2[n], :]