Memory leak in Euclid - Githubissues

prj- commented 2 years ago

==1207== 32 bytes in 1 blocks are still reachable in loss record 3 of 84
==1207==    at 0x4889F94: calloc (vg_replace_malloc.c:1328)
==1207==    by 0xD2AC57F: hypre_HostMalloc (memory.c:206)
==1207==    by 0xD2AC73B: hypre_MAlloc_core (memory.c:372)
==1207==    by 0xD2ACDDF: hypre_CAlloc (memory.c:911)
==1207==    by 0xD2ABCC7: hypre_HandleCreate (general.c:47)
==1207==    by 0xD2ABC8B: hypre_handle (general.c:38)
==1207==    by 0xD0688DF: hypre_CSRMatrixCreate (csr_matrix.c:42)
==1207==    by 0xD0335DB: hypre_ParCSRMatrixCreate (par_csr_matrix.c:90)
==1207==    by 0xCE82303: hypre_IJMatrixCreateParCSR (IJMatrix_parcsr.c:67)
==1207==    by 0xCE82747: hypre_IJMatrixInitializeParCSR_v2 (IJMatrix_parcsr.c:266)
==1207==    by 0xCE7DFDB: HYPRE_IJMatrixInitialize_v2 (HYPRE_IJMatrix.c:196)
==1207==    by 0x57E2E93: MatHYPRESetPreallocation_HYPRE (mhypre.c:1500)

Here is how to reproduce the leak.

$ git clone https://gitlab.com/petsc/petsc
$ cd petsc
$ ./configure --download-hypre --download-mpich
$ make all
$ cd src/ts/tests
$ make ex15
$ valgrind --leak-check=full --show-leak-kinds=all ./ex15 -da_grid_x 20 -da_grid_y 20 -boundary 0 -ts_max_steps 20 -Jtype 1 -ts_monitor -pc_type hypre -pc_hypre_boomeramg_smooth_type Euclid -ts_max_steps 1

victorapm commented 2 years ago

@prj, to solve this leak, PETSc needs a call to hypre_HandleDestroy:

https://github.com/hypre-space/hypre/blob/843e8c71efa1b4ffc6c72720e51e87f35c803cc5/src/utilities/general.c#L62

and this function is usually called via HYPRE_Finalize. But I see that PETSc does not call the pair HYPRE_Init and HYPRE_Finalize in PetscInitialize and PetscFinalize, respectively. Is there a particular for that?

prj- commented 2 years ago

That's an excellent question... Maybe @stefanozampini knows?

stefanozampini commented 2 years ago

That's an excellent question... Maybe @stefanozampini knows?

No particular reason. It was not actually needed to call those functions for quite a long time. What do they actually do? Someone has to code it properly when registering the classes and we need to be sure we can safely call hypre finalize . Are there global variables that tells us the status of the initialization?

liruipeng commented 2 years ago

That's an excellent question... Maybe @stefanozampini knows?

No particular reason. It was not actually needed to call those functions for quite a long time. What do they actually do? Someone has to code it properly when registering the classes and we need to be sure we can safely call hypre finalize . Are there global variables that tells us the status of the initialization?

We introduced the HYPRE_Init and Finalize functions in v.2.20.0. The initializations are mainly for GPUs, creating GPU library handles, global variables, etc. We recommend the call these two functions in all cases. Otherwise, memory leak would happen as we can see here.

victorapm commented 2 years ago

Just adding to Rui Peng's reply:

Are there global variables that tells us the status of the initialization?

There isn't a global variable for that in hypre.

BarrySmith commented 2 years ago

Note that this kind of memory leak, which happens because we do not call the hypre_finalize(), should be fixed, but it is not a big deal. The problematic memory leaks occur because we are not properly freeing some resources along the way in the computations, so we start wasting more and more memory that is no longer accessible, and eventually, the process runs out of memory. Are there any of those sorts of memory leaks that we can fix in PETSc?

Questions that will help us better understand where and when we should call hypre_init() below.

Looks like you want the GPU device initialized before calling hypre_init()?

And that all hypre solves use the GPU in a run or none? Is there a way to have some hypre solves on the CPU and some on the GPU determined by the user?

cudaSetDevice(device_id); /* GPU binding */
...
HYPRE_Init(); /* must be the first HYPRE function call */
HYPRE_SetMemoryLocation(HYPRE_MEMORY_DEVICE);
/* setup AMG on GPUs */
HYPRE_SetExecutionPolicy(HYPRE_EXEC_DEVICE);

tangqi commented 1 year ago

Hi, everyone. Is there any update on this? I reported this memory bug to the petsc developers a while ago. Now beyond the original Euclid smoother, there is a good motivation that I want to add the hypre's AIR preconditioner option into it. I would guess it will have the same memory issue if I just do it naively.

More specifically, to start with, I would like to add the following option into petsc's hypre interface (mfem::HypreBoomerAMG::SetAdvectiveOptions): https://docs.mfem.org/html/classmfem_1_1HypreBoomerAMG.html#a7ce86f2bda044e0cedb9ef73fef8c5bc However, I am worried that I will have the same issue in a time dependent simulation that the memory just grows linearly. Note that the AIR pc needs another allocations of grid_relax_points in hypre, which is currently missing on petsc.

Is there any suggestion on how to do both (fix the memory bug and add the AIR option into petsc)? Thanks.

stefanozampini commented 1 year ago

We already have some AIR options https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/pc/impls/hypre/hypre.c#L923. I'm no expert on this, so if you have improvements on that, please open a PR

bensworth commented 1 year ago

^ @stefanozampini currently petsc is missing a very important part of AIR which is the FC relaxation. This requires a small array to be passed to hypre as @tangqi mentioned and can be seen in the MFEM wrapper https://github.com/mfem/mfem/blob/master/linalg/hypre.cpp#L5131.

stefanozampini commented 1 year ago

@bensworth I'll try to find some time to implement it, thanks for pointing it out. Can you suggest a small numerical test to check everything is working as expected?

prj- commented 1 year ago

I’d be happy to try this out because the AIR performance in PETSc are terrible, worse than plain BoomerAMG for advection-dominated problem, which should not be the case.

stefanozampini commented 1 year ago

@prj- Thanks!

bensworth commented 1 year ago

Great! I will write up a simple example today where two-level AIR is (theoretically) exact in one iteration and should be a good unit test. One other comment : we don't have a block version of AIR in hypre (only in PyAMG), so the work around for DG advective problems is to scale by the (element) block diagonal inverse (this makes the matrix amenable to a scalar as opposed to block reduction method). This is also in hypre, and the MFEM wrapper here: https://github.com/mfem/mfem/blob/master/linalg/hypre.cpp#L2762. It would be nice to have access to this function for DG advective discretizations.

@prj- once we get a complete petsc interface, if you have interesting advection-dominmated problems where you still aren't seeing good performance feel free to reach out. Sometimes there are simple modifications to the discretization that make solver performance a lot better, or we have a number of development variations of AIR in PyAMG that we could test if you print your matrix to file or wrap it in petsc4py.

prj- commented 1 year ago

We got lost exchanging about ℓAIR at the end of this issue, but the prior discussion about HYPRE_[Init|Finalize]() to solve the leak is actually more relevant than ever, see discussion in https://github.com/hypre-space/hypre/pull/871#issuecomment-1503492038.

BarrySmith commented 1 year ago

We introduced the HYPRE_Init and Finalize functions in v.2.20.0. The initializations are mainly for GPUs, creating GPU library handles, global variables, etc. We recommend the call these two functions in all cases. Otherwise, memory leak would happen as we can see here.

@jedbrown is always concerned about initializing resources, even though they may never be used. For example, if I run a PETSc program that calls HPRE_Init() in PetscInitialize() this means my program is always grabbing GPU resources every time it is run even if the program itself never ends up using hypre (very command since PETSc programs are options database driven). This is painful on shared systems (where other people may be using the GPU) and during CI where the program is running 1000s of times, each time trying to grab the GPU. Thus PETSc has "lazy" initialization of GPU which allows delaying any touching of the GPU until the code knows it will actually use the GPU.

Is there any chance you would consider adding lazy GPU initialization to hypre so the GPU is not always grabbed when the program starts up?

liruipeng commented 1 year ago

We introduced the HYPRE_Init and Finalize functions in v.2.20.0. The initializations are mainly for GPUs, creating GPU library handles, global variables, etc. We recommend the call these two functions in all cases. Otherwise, memory leak would happen as we can see here.

@jedbrown is always concerned about initializing resources, even though they may never be used. For example, if I run a PETSc program that calls HPRE_Init() in PetscInitialize() this means my program is always grabbing GPU resources every time it is run even if the program itself never ends up using hypre (very command since PETSc programs are options database driven). This is painful on shared systems (where other people may be using the GPU) and during CI where the program is running 1000s of times, each time trying to grab the GPU. Thus PETSc has "lazy" initialization of GPU which allows delaying any touching of the GPU until the code knows it will actually use the GPU.

Is there any chance you would consider adding lazy GPU initialization to hypre so the GPU is not always grabbed when the program starts up?

Hi @BarrySmith , there are users at Livermore as well having the same concern of GPU initialization. We are cooking up a lightweight HYPRE_Init, which is more or less the same as the "lazy" initialization idea you mentioned.

Thanks!

hypre-space / hypre

Memory leak in Euclid #764

Questions that will help us better understand where and when we should call hypre_init() below.