optimad / bitpit

Open source library for scientific HPC
http://optimad.github.io/bitpit/
GNU Lesser General Public License v3.0
117 stars 33 forks source link

Memory usage #191

Open tbjoss opened 3 years ago

tbjoss commented 3 years ago

Hello Everyone,

I observe peak demand in memory between the "Cells removed: xxx" and "importing new octants". On the current hardware (16GB are available) this limits the octree size to be less than 10 million cells. Is this a normal demand? Further, I used a release version of bitpit to create the octree for this test.

andrea-iob commented 3 years ago

Would it be possible for you to generate a memory usage report with heaptrack (https://github.com/KDE/heaptrack)? Heaptrack is a KDE application and is usually available among the packages of many distributions. Just run "heaptrack application_name", I'm interested in the gz archive generated by heaptrack. To get meaningful results bitpit should be compiled in "RelWithDebInfo" build type.

tbjoss commented 3 years ago

The heaptrack file for this example (https://www.dropbox.com/s/8ne5wpm5cdt60pg/heaptrack.grid_generator_d3q27.29121.gz?dl=0).

additional information:

this test requires 10GB memory for 6'773'056 cells. To reach the final octree the procedure is:

andrea-iob commented 3 years ago

Thanks for the heaptrack profile.

I briefly looked at the profile and I've seen that you are building both adjacencies and interfaces. Just to double check: does your code need the interfaces? Disabling the interface should allow to save some memory.

tbjoss commented 3 years ago

I currently use the interfaces to mark hanging vertices and twin vertices (vertices that belong to two grid levels). if there's a way to do this more easily without interfaces, we could get rid of them.

andrea-iob commented 3 years ago

nodes

If you need to identify the highlighted nodes, maybe something along the lines of the following pseudo code may work (if needed, next week I can send you some real code):

for cell in cells
    for face in cells_faces
        nCellAdjacencies = octree.getAdjacencyCount(cell)
        if nCellAdjacencies != 1
            continue
        end if

        cell_level = octree.getLevel(cell) // Function getLevel is only available for VolOctree patches
        neigh_level = octree.getLevel(octree.getAdjacency(cell))
        if cell_level == neigh_level:
            continue
        end if

        ConstProxyVector<long> faceVertexIds = cell.getFaceVertexIds(face);
        int nFaceVertices = faceVertexIds.size();
        for (int k = 0; k < nFaceVertices; ++k) {
            long faceVertexId = faceVertexIds[k];
            ... Do something with vertex id ...
        end for
    end for
end for
andrea-iob commented 3 years ago

Pull request #197 removes some unneeded allocations in the VolOctree class, it will not help in reducing the memory usage, but it may speedup a bit the update of the adjacencies.

tbjoss commented 3 years ago

thank your for your hint. without the interfaces we can observe some performance improvements:

andrea-iob commented 3 years ago

I'm looking at the memory profile, there are two things worth checking that might increase the performances a bit:

In pull #197 I added some more changes to the generation of VolOctree adjacencies, I think the new code should be a bit faster and should also slighlty reduce memory usage (beware we are currently testing the code, it may still have some bugs).

There is also a pull request (#193) that tries to improve how the ProxyVector handles internal storage and another one that should improve the creation of the interfaces (#187), these pull requests should help in removing some allocations I see in the memory profile during the update of the interfaces.

andrea-iob commented 3 years ago

What's the number of cells you would like to handle on your current hardware?

tbjoss commented 3 years ago

Thank you very much for this information. The flush_data functions are only performed once at the end of the run. While their performance can definitely be improved, they only make up a small amount of the overall run time. I will, however, have a look at the levelset allocations (they are our main time sinks currently). The final number of cells we will have to handle will be in the order of 50-100 million, but that will obviously be done on a machine with significantly more memory. The reduction we got from the removal of the interfaces already helped a lot.

andrea-iob commented 3 years ago

We just discovered that the update of VolOctree mesh relies on cell interfaces to identify the vertices that should be deleted. This was unintended and should be fixed in #201. The branch is still under testing.