Open Psirus opened 6 years ago
Good! If I understand it correctly, based on the current implementation of the assembler: the loops over cells and dof types are then done via generator.next(). And the entry the generator returns is the result of cell.integrate(f)[someDof] and cell.DofNumbering(someDof), yes?
I totally agree, the assembler should only assemble a matrix/vector without performing the loop over the cells. We should however discuss, where exactly this class is stored. IMO, this belongs to the problem (see for example TimeDependentProblem).
I think there can be nothing bad be found about this idea. Splitting tasks and dependencies is great! Some thoughts:
In your code example - which I honestly find quite complicated (will .isValid()
and .next()
be easier?) - the fact of the lazy evaluation is IMO not present (or I really misunderstand the code, sorry then). It seems like the entry CellEntryGenerator
stores all its entries in a vector. This is not feasible for e.g. all dense element matrices. For a better understanding, could you maybe adapt your example to take a bunch of objects (-> our cell group) and a function (-> our MatrixFunction) and evaluates them, when they are needed?
In OldNuTo, this was the only loop that is OpenMP parallel. Do we want to think about that as well? It may require the iterator to be a random access iterator. No idea how that will work. But maybe: YAGNI when we focus on MPI parallelization.
Would it be nice to have this whole loop internal, like std::transform
? We may end up with
GlobalSparseMatrix m;
auto ToGlobalMatrix = [&](Entry& e)
{
NuTo::Assemble(m, e);
// or
NuTo::Assemble(m, MakeLumpled(e));
};
NuTo::LazyEvaluateLoop(cells.begin(), cells.end(), HessianFunction, ToGlobalMatrix);
(We could then add something like an execution policy for OpenMP)
Where would we put this code? In each main file? Only internally - e.g. wrapped in an Assembler
class? This may also influence the design.
// in main / EquationSystem / TimeDependentProblem / WhateverWrapperIsHipRightNow
CellEntryGenerator gen(cells, gradientFunction);
Assembler assembler(gen);
// in Assembler GlobalVector globalVec; for (Entry entry : mGenerator) { globalVec[entry.second] = entry.first; }
Refactoring it out can happen in a second step if necessary.
Another IMO huge advantage could be a decoupling from CellInterface
. On first glance, we could have a TimeDependentCell
with methods like Gradient(t, dt)
and Hessian(t, dt)
. This cell does not necessarily implement CellInterface
. With corresponding entry generators like GraidentGenerator(cells, t, dt)
and HessianGenerator(cells, t, dt)
we could still assemble those.
I added some template stuff to the prototype of @Psirus. The main advantage: The methods begin()
and end()
that are the same for all classes, are now defined in the common interface. Someone, who wants to create his own generator, only needs to implement Next(), IsValid(), Get()
.
Code can be found here.
I implemented a version of that in this branch. There is, e.g. a DofMatrixGenerator
that is the base class of CellMatrixEntries
(Naming...?). Things I noticed:
DofMatrixGenerator
is an abstract class, we cannot pass by value. DofMatrixGenerator::Get()
or DofMatrixGenerator::begin()
are not const. Get()
, for example, modifies the state. (We could make everything mutable...)IsValid()
is false, it is useless. Especially, in combination of having to pass it via reference, this is ugly. Introducing a Reset()
may solve the problem. But when is that called? At the beginning of the assembly? Manually? Or have a Clone()
method instead that returns a new valid instance?CellMatrixEntries, CellVectorEntries, CellLumpedMatrixEntries
that all work on the CellInterface
and share a lot of code. But I cannot think of an easy refactoring.MatrixEntry
and VectorEntry
. So I feel like moving it from base
to mechanics
and specialize it for those types in a source file. Why mechanics
? Because both entry types need mechanics/dofs/DofVector
.
[easy fix...]The recent approach was to have some iterator class that performs the loop. So the assembler assembles only matrices and vectors but still performs the loop over all cells. I think, splitting that into features
may simplify this a lot.
Point 1) may result in a very short (templates may be viable) class/function like
//! TFunction would return an equivalent of std::pair(Cell.Integrate(...), Cell.DofNumbering(...)
template <typename TBegin, typename TEnd, typename TFunction>
GlobalMatrix Assemble(TBegin begin, TEnd end, TFunction f, /*dof info, dof types...*/)
{
MatrixAssembler m;
for (;begin != end; ++begin)
m.Add(f(*begin));
return m.Get();
}
Or, for simplicity, be specialized for certain iterator types and TFunction
.
For point 2) we can even think of the old
NuTo way of moving the assembly to the global types like
GlobalMatrix::Add(DofMatrix<double>& elemOutput, DofMatrix<int>& dofNumbers);
GlobalVector::Add(DofVector<double>& elemOutput, DofMatrix<int>& dofNumbers);
GlobalVector::AddLumped(DofMatrix<double>& elemOutput, DofMatrix<int>& dofNumbers);
Or we put that into classes like MatrixAssembler/VectorAssembler/LumpedAssembler
(as in the Assemble(..)
code above) . This may make reusing of exising sparsity patterns easier.
We can split that into two, but I'm not sure how much we gain by that. Specifically, to the points in your post above:
DofGenerator
can not be a temporarystill hold. So while I think splitting is fine, that wasn't "the problem", right?
As for 2): Reusing a sparsity pattern in the assembler seems a good idea. Making the vectors/matrices assemble themselves - I personally don't like that; should not be their concern imho.
I was thinking about the assembler that really assembles vectors/matrices. And I would like to implement that as soon as the RemoveJ/K team #164 finished. I am especially interested in reusing the sparsity patterns.
MatrixAssembler::MatrixAssembler(DofInfo = {});
void MatrixAssembler::Resize(DofInfo);
void MatrixAssembler::Add(DofMatrix<double> contribution, DofMatrix<int> numbering);
const DofMatrixSparse& MatrixAssembler::Get() const;
Internally, I would probably use vectors of Eigen::Triplet
to efficiently build the matrix. In the Get()
method, they would be transferred via setFromTriplets
to the real sparse matrices (const
attribute questionable). This would cover the current features and is decoupled from CellInterface
. We can also think of an LumpedMatrixAssembler
or have
void MatrixAssembler::AddLumped(DofVector<double>, DofVector<int>);
As a new feature, we can try to reuse the sparsity. So once the nonzeros are known - like after the first assembly run - we can assemble directly with coeffRef()
, instead of the triplet lists. This is supposed to be slightly faster, more memory efficient and avoids the memory allocations. But I am not sure about the interface.
1) My idea is to have a bool mFinished
flag together with a void Finalize/Finish/Complete/MakeCompressed()
method. Naming help needed. So we expect the following flow:
MatrixAssembler assembler(); // mFinished = false;
assembler.Resize(dofInfo);
// first evaluation
for (auto cell : cells)
assembler.Add(cell.Integrate(f), cell.DofNumbering()); // builds into triplets since mFinished == false;
// DoStuff(assembler.Get()); throws, since mFinished == false
// Or Get() internally calls Finish() automatically...
assembler.Finish(); // mFinished = true, converts triplets to matrix, deallocates triplets.
DoStuff(assembler.Get()); // fine now
// second/third/... evaluation
assembler.SetZero();
for (auto cell : cells)
assembler.Add(cell.Integrate(f), cell.DofNumbering()); // builds directly into matrix since mFinished == true;
// no need to call assembler.Finish();
DoStuff(assembler.Get());
2) Another idea is to allocate nonzeros before using
for (auto cell : cells)
assembler.AllocateNonzeros(cell.DofNumbering()); // into triplets;
3) Or we introduce a method that copies the nonzeros, e.g. an overload
MatrixAssembler::Resize(DofMatrixSparse other);
4) Or we even introduce two separate assemblers
DirectAssember assember(tripletListAssember.Get());
5) Or, since the coeffRef
approach does not require members, we can provide free functions for that.
auto hessian = tripletAssembler.Get();
//...
hessian.SetZero();
for (auto cell : cells)
Assemble(hessian, cell.Integrate(f), cell.DofNumbering());
But I would prefer the first (automatic using mFinished
) or the last (no mFinished
, assembler class == triplet lists, free function == coeffRef) solution.
The challenge is that the loop over the cells is now done outside of the assembler. Once the nonzeros are known, there is no reallocation involved. A simple mesh coloring (= MaximumIndependentSets
) eliminates race conditions on parallel writes into the matrix.
The TripletList assembly is not that trivial. We could allocate into numThreads
separate assemblers and add up the matrices. Or we concatenate the separate triplet lists. Or we have a separate ParallelAssembler
that deals with that.
Or we postpone that.
What's the problem?
Currently, the assemble has functions like
and in there does things like calling
cell.Integrate(f)
. These things (cells, functions-to-be-integrated) are not really part of assembly. Ideally it would just get a vector (matrix) and the numbering and insert them it the appropriate place into the global matrix.Problems:
cell.Integrate()
Group
orCellInterface
BuildDiagonallyLumpedMatrix
)Proposal
Instead of writing the cells and functions directly at the assembler, the assembler would get a generator that will return vector (matrix) + numbering when asked. The assembler does not, and should not, care how these are generated. Working with functions and cells could be a derived class / implementation of a generator base / concept. If additional transformations are to be applied to a local output (mass lumping, condensation), these can be performed in the generator.
Code
The return type of such a generator will be assumed to be
using Entry = std::pair<Eigen::VectorXd, Eigen::VectorXi>;
. A simple implementation could bewhich would allow the assembler to do something like this
This is nice because it is dead simple. Another approach, for fans of syntactic sugar and range-based for loops:
could result in nice syntax like this
However, this results in a bit of code for the iterators. A complete example can be seen here.
Questions
Entry
look like?<insert-your-concern-here>