Memory usage - Githubissues

tbjoss commented 3 years ago

Hello Everyone,

I observe peak demand in memory between the "Cells removed: xxx" and "importing new octants". On the current hardware (16GB are available) this limits the octree size to be less than 10 million cells. Is this a normal demand? Further, I used a release version of bitpit to create the octree for this test.

andrea-iob commented 3 years ago

Would it be possible for you to generate a memory usage report with heaptrack (https://github.com/KDE/heaptrack)? Heaptrack is a KDE application and is usually available among the packages of many distributions. Just run "heaptrack application_name", I'm interested in the gz archive generated by heaptrack. To get meaningful results bitpit should be compiled in "RelWithDebInfo" build type.

tbjoss commented 3 years ago

The heaptrack file for this example (https://www.dropbox.com/s/8ne5wpm5cdt60pg/heaptrack.grid_generator_d3q27.29121.gz?dl=0).

additional information:

total runtime: 660.11s.
bytes allocated in total (ignoring deallocations): 24.23GB (36.71MB/s)
calls to allocation functions: 375032176 (568135/s)
temporary memory allocations: 11299481 (17117/s)
peak heap memory consumption: 10.51GB
peak RSS (including heaptrack overhead): 165.46GB
total memory leaked: 95.13KB

this test requires 10GB memory for 6'773'056 cells. To reach the final octree the procedure is:

update levelsets (in this example case only one levelset is used and the narrowband size is 0)
select cells
refine selected cells
repeat until all target levels are reached

andrea-iob commented 3 years ago

Thanks for the heaptrack profile.

I briefly looked at the profile and I've seen that you are building both adjacencies and interfaces. Just to double check: does your code need the interfaces? Disabling the interface should allow to save some memory.

tbjoss commented 3 years ago

I currently use the interfaces to mark hanging vertices and twin vertices (vertices that belong to two grid levels). if there's a way to do this more easily without interfaces, we could get rid of them.

andrea-iob commented 3 years ago

nodes

If you need to identify the highlighted nodes, maybe something along the lines of the following pseudo code may work (if needed, next week I can send you some real code):

for cell in cells
    for face in cells_faces
        nCellAdjacencies = octree.getAdjacencyCount(cell)
        if nCellAdjacencies != 1
            continue
        end if

        cell_level = octree.getLevel(cell) // Function getLevel is only available for VolOctree patches
        neigh_level = octree.getLevel(octree.getAdjacency(cell))
        if cell_level == neigh_level:
            continue
        end if

        ConstProxyVector<long> faceVertexIds = cell.getFaceVertexIds(face);
        int nFaceVertices = faceVertexIds.size();
        for (int k = 0; k < nFaceVertices; ++k) {
            long faceVertexId = faceVertexIds[k];
            ... Do something with vertex id ...
        end for
    end for
end for

andrea-iob commented 3 years ago

Pull request #197 removes some unneeded allocations in the VolOctree class, it will not help in reducing the memory usage, but it may speedup a bit the update of the adjacencies.

tbjoss commented 3 years ago

thank your for your hint. without the interfaces we can observe some performance improvements:

total runtime: 498.46s.
bytes allocated in total (ignoring deallocations): 18.06GB (36.23MB/s)
calls to allocation functions: 270646868 (542962/s)
temporary memory allocations: 23376738 (46897/s)
peak heap memory consumption: 6.70GB
peak RSS (including heaptrack overhead): 108.31GB
total memory leaked: 12.79KB

andrea-iob commented 3 years ago

I'm looking at the memory profile, there are two things worth checking that might increase the performances a bit:

there are lots of temporary allocations done by the levelset function getObjectIds, this function generates the list of objects ids added to the levelset. The list is generated on the flight, so a new vector is created at each function call. If possible, it would be better to call the function once (outside the refine loop), store the list of objects ids in a vector and then use that vector to get object ids. If this is not possible in your case, maybe we can introduce a more efficient way to get the object ids;
there are some string/stringstream allocations in the functions flush_boundary_data/flush_grid_data; I don't think these allocations come from bitpit, but it may be worth checking if they can be removed (the peak memory consumption will not decrese, but the code may be faster).

In pull #197 I added some more changes to the generation of VolOctree adjacencies, I think the new code should be a bit faster and should also slighlty reduce memory usage (beware we are currently testing the code, it may still have some bugs).

There is also a pull request (#193) that tries to improve how the ProxyVector handles internal storage and another one that should improve the creation of the interfaces (#187), these pull requests should help in removing some allocations I see in the memory profile during the update of the interfaces.

andrea-iob commented 3 years ago

What's the number of cells you would like to handle on your current hardware?

tbjoss commented 3 years ago

Thank you very much for this information. The flush_data functions are only performed once at the end of the run. While their performance can definitely be improved, they only make up a small amount of the overall run time. I will, however, have a look at the levelset allocations (they are our main time sinks currently). The final number of cells we will have to handle will be in the order of 50-100 million, but that will obviously be done on a machine with significantly more memory. The reduction we got from the removal of the interfaces already helped a lot.

andrea-iob commented 3 years ago

We just discovered that the update of VolOctree mesh relies on cell interfaces to identify the vertices that should be deleted. This was unintended and should be fixed in #201. The branch is still under testing.

optimad / bitpit

Memory usage #191