Closed JordiManyer closed 1 year ago
Thanks very much for the PR @JordiManyer !
I think we could write a test in the spirit of https://github.com/gridap/GridapPETSc.jl/blob/master/test/mpi/GCTests.jl
so that we 100% confirm that all this new machinery is actually working and we do not leak any memory along the way.
@amartinhuertas I've added some more features and tests. This is now closer to something we can merge. However, I do have a couple of notes:
sc_init()
/sc_finalize()
, but the tests run anyways. Also, it does not seem to make a difference whether we init or not the p4est library. So this is my question: Does any of these things matter? What are sc_init()
and p4est_init()
doing? with() do ... end
statement. Then the flag _INITIALIZED
would not be set to true
and the models would not be de-allocated unless the user manually de-allocates them. Should this be left like this? How can we enhance this? MeshHierarchy
then de-allocate all P4est objects. However, we would like to do so without destroying the OctreeDistributedDiscreteModels
(or at least returning the DistributedDiscreteModels
first, which are still useful). So we probably want to add some more functionality/flexibility to what we already have, right? I have not managed to make it work with sc_init()/sc_finalize(), but the tests run anyways.
Can you elaborate a little bit more on this? Which is the error that you get?
Does any of these things matter? What are sc_init() and p4est_init() doing?
As far as I remember they let you: (1) control the log verbosity level (useful for debugging purposes. (2) that only the root process prints log messages on screen. By default, all processes print on screen. By passing the communicator, then p4est can restrict logging to the root processor.
Also, it occurred to me that adding finalizers does not exactly cover our usecase: In the context of GMG, we would like to build the MeshHierarchy then de-allocate all P4est objects. However, we would like to do so without destroying the OctreeDistributedDiscreteModels (or at least returning the DistributedDiscreteModels first, which are still useful). So we probably want to add some more functionality/flexibility to what we already have, right?
That is definitely more complex to achieve (automatically). I would not mind too much about this. The memory footprint of p4est data structures is VERY small compared to the heavy data structures that we built atop p4est ones. Thus, by eagerly freeing p4est memory you are not going to gain much. The bulk of the memory footprint is in GridapDistributed data structures.
Since initializing p4est is not necessary, one could skip the with() do ... end statement. Then the flag _INITIALIZED would not be set to true and the models would not be de-allocated unless the user manually de-allocates them. Should this be left like this? How can we enhance this?
What do we do in GridapPETSc.with(...) ?
I think initialization must be enforced, otherwise you loose automatic deallocation of p4est data structures. If not done, we should issue an error.
BTW, on another note, are we sure that free subroutines in p4est do not have collective semantics? Otherwise we may end with deadlocks, as some processes may trigger free and other do not as the result of the normal operation of garbage collector, which is fully asynchronous.
Since initializing p4est is not necessary, one could skip the with() do ... end statement. Then the flag _INITIALIZED would not be set to true and the models would not be de-allocated unless the user manually de-allocates them. Should this be left like this? How can we enhance this?
What do we do in GridapPETSc.with(...) ?
I think initialization must be enforced, otherwise you loose automatic deallocation of p4est data structures. If not done, we should issue an error.
In GridapPETSc, PETSc needs to be initialised in order to work, so initialisation is a must. Also, destroying PETSc structures has indeed collective semantics, which is why objects are marked and then destroyed at the end of the with()
.
This is not the case for P4est. In fact we could just ignore the whole _INITIALISED
idea. We can keep the automatic GC through the Init(model)
/Finalize(model)
semantics without enforcing initialisation. We could keep the with()
as a way of initialising P4est with specific settings, and keep also the allocation/deallocation counter for convenience.
BTW, on another note, are we sure that free subroutines in p4est do not have collective semantics? Otherwise we may end with deadlocks, as some processes may trigger free and other do not as the result of the normal operation of garbage collector, which is fully asynchronous.
From what I've tested, I believe it is local. If one processor frees p4est memory, it does not affect the other processors as long as the destroyed model is not used again (which it shouldn't, since the GC only destroys out-of-scope objects).
If we want to be 100% safe we could introduce a global array of pointers, which stores the pointers to the p4est structures that can be destroyed. The Finalize(model)
could push it's p4est structures onto the array, and all of the structures would be then destroyed at the end of the with()...
block.
In this case, we would indeed need to enforce initialisation, to make sure all p4est structures are inside the destroy queue before leaving the with()
statement. This is what PETSc does, but PETSc has this destroy queue implemented within the library. The fact that p4est does not have this is probably a good indicator that the deallocation is local.
Whatever you prefer.
In GridapPETSc, PETSc needs to be initialised in order to work, so initialisation is a must. Also, destroying PETSc structures has indeed collective semantics, which is why objects are marked and then destroyed at the end of the with().
Ok, agreed.
This is not the case for P4est. In fact we could just ignore the whole _INITIALISED idea. We can keep the automatic GC through the Init(model)/Finalize(model) semantics without enforcing initialisation. We could keep the with() as a way of initialising P4est with specific settings, and keep also the allocation/deallocation counter for convenience.
Ok, fair enough. We can proceed this way.
Whatever you prefer.
Let us move forward assuming that p4est free subroutines can be called asynchronously without any issue.
All issues solved, see also PR #28 . Closing and merging PR.
Automating GC for ease of use of OctreeDistributedDiscreteModels.