nutofem / nuto

NuTo - yet another finite element library
https://nuto.readthedocs.io
Boost Software License 1.0
17 stars 5 forks source link

Decoupling of assembler and cells #194

Open Psirus opened 6 years ago

Psirus commented 6 years ago

What's the problem?

Currently, the assemble has functions like

GlobalDofVector BuildVector(const Group<CellInterface>& cells, CellInterface::VectorFunction f) const;

and in there does things like calling cell.Integrate(f). These things (cells, functions-to-be-integrated) are not really part of assembly. Ideally it would just get a vector (matrix) and the numbering and insert them it the appropriate place into the global matrix.

Problems:

Proposal

Instead of writing the cells and functions directly at the assembler, the assembler would get a generator that will return vector (matrix) + numbering when asked. The assembler does not, and should not, care how these are generated. Working with functions and cells could be a derived class / implementation of a generator base / concept. If additional transformations are to be applied to a local output (mass lumping, condensation), these can be performed in the generator.

Code

The return type of such a generator will be assumed to be using Entry = std::pair<Eigen::VectorXd, Eigen::VectorXi>;. A simple implementation could be

class EntryGenerator
{
public:
    virtual Entry next() = 0;
    virtual bool isValid() = 0;
};

which would allow the assembler to do something like this

while(generator.isValid())
{
    Entry entry = generator.next();
    globalVec[entry.second] = entry.first;
}

This is nice because it is dead simple. Another approach, for fans of syntactic sugar and range-based for loops:

class EntryGenerator
{
public:
    virtual EntryIterator begin() = 0;
    virtual EntryIterator end() = 0;
    // ...
};

could result in nice syntax like this

for (Entry entry : generator)
{
    globalVec[entry.second] = entry.first;
}

However, this results in a bit of code for the iterators. A complete example can be seen here.

Questions

pmueller2 commented 6 years ago

Good! If I understand it correctly, based on the current implementation of the assembler: the loops over cells and dof types are then done via generator.next(). And the entry the generator returns is the result of cell.integrate(f)[someDof] and cell.DofNumbering(someDof), yes?

joergfunger commented 6 years ago

I totally agree, the assembler should only assemble a matrix/vector without performing the loop over the cells. We should however discuss, where exactly this class is stored. IMO, this belongs to the problem (see for example TimeDependentProblem).

TTitscher commented 6 years ago

I think there can be nothing bad be found about this idea. Splitting tasks and dependencies is great! Some thoughts:

Psirus commented 6 years ago
  1. The generator stores these just for the example. I didn't want to write groups, cells and functions. Seems this was confusing, sorry for that. Of course I'm not proposing to store all the local matrices.
  2. Personally: YAGNI. If anyone is very interested in parallelization and sees a problem with this idea: I'm not set on these generators, I just want the decoupling.
  3. Looks good as well. A more complete example would be interesting.
  4. At first, I would just put it in the assembler:
    
    // in main / EquationSystem / TimeDependentProblem / WhateverWrapperIsHipRightNow
    CellEntryGenerator gen(cells, gradientFunction);
    Assembler assembler(gen);

// in Assembler GlobalVector globalVec; for (Entry entry : mGenerator) { globalVec[entry.second] = entry.first; }


Refactoring it out can happen in a second step if necessary.
TTitscher commented 6 years ago

Another IMO huge advantage could be a decoupling from CellInterface. On first glance, we could have a TimeDependentCell with methods like Gradient(t, dt) and Hessian(t, dt). This cell does not necessarily implement CellInterface. With corresponding entry generators like GraidentGenerator(cells, t, dt) and HessianGenerator(cells, t, dt) we could still assemble those.

TTitscher commented 6 years ago

I added some template stuff to the prototype of @Psirus. The main advantage: The methods begin() and end() that are the same for all classes, are now defined in the common interface. Someone, who wants to create his own generator, only needs to implement Next(), IsValid(), Get(). Code can be found here.

TTitscher commented 6 years ago

I implemented a version of that in this branch. There is, e.g. a DofMatrixGenerator that is the base class of CellMatrixEntries (Naming...?). Things I noticed:

TTitscher commented 6 years ago

The recent approach was to have some iterator class that performs the loop. So the assembler assembles only matrices and vectors but still performs the loop over all cells. I think, splitting that into features

  1. loop over cells (or something else that generates element contributions)
  2. adding an element contribution to a global matrix/vector

may simplify this a lot.

Point 1) may result in a very short (templates may be viable) class/function like

//! TFunction would return an equivalent of std::pair(Cell.Integrate(...), Cell.DofNumbering(...)
template <typename TBegin, typename TEnd, typename TFunction>
GlobalMatrix Assemble(TBegin begin, TEnd end, TFunction f, /*dof info, dof types...*/)
{
   MatrixAssembler m;
   for (;begin != end; ++begin)
      m.Add(f(*begin));
   return m.Get();  
}

Or, for simplicity, be specialized for certain iterator types and TFunction.

For point 2) we can even think of the old NuTo way of moving the assembly to the global types like

GlobalMatrix::Add(DofMatrix<double>& elemOutput, DofMatrix<int>& dofNumbers);
GlobalVector::Add(DofVector<double>& elemOutput, DofMatrix<int>& dofNumbers);
GlobalVector::AddLumped(DofMatrix<double>& elemOutput, DofMatrix<int>& dofNumbers);

Or we put that into classes like MatrixAssembler/VectorAssembler/LumpedAssembler (as in the Assemble(..) code above) . This may make reusing of exising sparsity patterns easier.

Psirus commented 6 years ago

We can split that into two, but I'm not sure how much we gain by that. Specifically, to the points in your post above:

still hold. So while I think splitting is fine, that wasn't "the problem", right?

As for 2): Reusing a sparsity pattern in the assembler seems a good idea. Making the vectors/matrices assemble themselves - I personally don't like that; should not be their concern imho.

TTitscher commented 6 years ago

I was thinking about the assembler that really assembles vectors/matrices. And I would like to implement that as soon as the RemoveJ/K team #164 finished. I am especially interested in reusing the sparsity patterns.

Basic interface

MatrixAssembler::MatrixAssembler(DofInfo = {});
void MatrixAssembler::Resize(DofInfo);
void MatrixAssembler::Add(DofMatrix<double> contribution, DofMatrix<int> numbering);
const DofMatrixSparse& MatrixAssembler::Get() const;

Internally, I would probably use vectors of Eigen::Triplet to efficiently build the matrix. In the Get() method, they would be transferred via setFromTriplets to the real sparse matrices (const attribute questionable). This would cover the current features and is decoupled from CellInterface. We can also think of an LumpedMatrixAssembler or have

void MatrixAssembler::AddLumped(DofVector<double>, DofVector<int>);

Reuse nonzeros

As a new feature, we can try to reuse the sparsity. So once the nonzeros are known - like after the first assembly run - we can assemble directly with coeffRef(), instead of the triplet lists. This is supposed to be slightly faster, more memory efficient and avoids the memory allocations. But I am not sure about the interface.

1) My idea is to have a bool mFinished flag together with a void Finalize/Finish/Complete/MakeCompressed() method. Naming help needed. So we expect the following flow:

MatrixAssembler assembler(); // mFinished = false;
assembler.Resize(dofInfo);

// first evaluation
for (auto cell : cells)
    assembler.Add(cell.Integrate(f), cell.DofNumbering()); // builds into triplets since mFinished == false;
// DoStuff(assembler.Get()); throws, since mFinished == false
// Or Get() internally calls Finish() automatically...
assembler.Finish(); // mFinished = true, converts triplets to matrix, deallocates triplets.
DoStuff(assembler.Get()); // fine now

// second/third/... evaluation
assembler.SetZero();
for (auto cell : cells)
    assembler.Add(cell.Integrate(f), cell.DofNumbering()); // builds directly into matrix since mFinished == true;
// no need to call assembler.Finish();
DoStuff(assembler.Get());

2) Another idea is to allocate nonzeros before using

for (auto cell : cells)
    assembler.AllocateNonzeros(cell.DofNumbering()); // into triplets;

3) Or we introduce a method that copies the nonzeros, e.g. an overload

MatrixAssembler::Resize(DofMatrixSparse other);

4) Or we even introduce two separate assemblers

DirectAssember assember(tripletListAssember.Get());

5) Or, since the coeffRef approach does not require members, we can provide free functions for that.

auto hessian = tripletAssembler.Get();
//...
hessian.SetZero();
for (auto cell : cells)
    Assemble(hessian, cell.Integrate(f), cell.DofNumbering());

But I would prefer the first (automatic using mFinished) or the last (no mFinished, assembler class == triplet lists, free function == coeffRef) solution.

Parallel assembly

The challenge is that the loop over the cells is now done outside of the assembler. Once the nonzeros are known, there is no reallocation involved. A simple mesh coloring (= MaximumIndependentSets) eliminates race conditions on parallel writes into the matrix. The TripletList assembly is not that trivial. We could allocate into numThreads separate assemblers and add up the matrices. Or we concatenate the separate triplet lists. Or we have a separate ParallelAssembler that deals with that. Or we postpone that.

Discussion