gridap / GridapPETSc.jl

Provides PETSc solvers to the Gridap project
Other
31 stars 9 forks source link

Discussion on GC of PETSc objects on PETSc-Users #41

Open amartinhuertas opened 3 years ago

amartinhuertas commented 3 years ago

Dear PETSc users, What is the main reason underlying PetscDestroy subroutines having global collective semantics? Is this actually true for all > PETSc objects? Can this be relaxed/deactivated by, e.g., compilation macros/configuration options? We are leveraging PETSc from Julia in a parallel distributed memory context (several MPI tasks running the Julia REPL each). Julia uses Garbage Collection (GC), and we would like to destroy the PETSc objects automatically when the GC decides so along the simulation. In this context, we cannot guarantee deterministic destruction on all MPI tasks as the GC decisions are local to each task, no global semantics guaranteed. Thanks in advance! Best regards, Alberto.

Ahh, this makes perfect sense. The code for PetscObjectRegisterDestroy() and the actual destruction (called in PetscFinalize()) is very simply and can be found in src/sys/objects/destroy.c PetscObjectRegisterDestroy(), PetscObjectRegisterDestroyAll(). You could easily maintain a new array like PetscObjectRegisterGCDestroy_Objects[] and add objects with PetscObjectRegisterGCDestroy() and then destroy them with PetscObjectRegisterDestroyGCAll(). The only tricky part is that you have to have, in the context of your Julia MPI, make sure that PetscObjectRegisterDestroyGCAll() is called collectively over all the MPI ranks (that is it has to be called where all the ranks have made the same progress on MPI communication) that have registered objects to destroy, generally PETSC_COMM_ALL. We would be happy to incorporate such a system into the PETSc source with a merge request. Barry

I think that it is not just MPI_Comm_free that is potentially problematic. Here are some additional areas off the top of my head:

  1. PetscSF with -sf_type window. Destroy (when the refcount drops to zero) calls MPI_Win_free (which is collective over comm)
  2. Deallocation of MUMPS objects is tremendously collective. In general the solution of just punting MPI_Comm_free to PetscFinalize (or some user-defined time) is, I think, insufficient since it requires us to audit the collectiveness of all XXX_Destroy functions (including in third-party packages). Barry's suggestion of resurrecting objects in finalisation using PetscObjectRegisterDestroy and then collectively clearing that array periodically is pretty close to the proposal that we cooked up I think. Jack can correct any missteps I make in explanation, but perhaps this is helpful for Alberto:
  3. Each PETSc communicator gets two new attributes "creation_index" [an int64], "resurrected_objects" [a set-like thing]
  4. PetscHeaderCreate grabs the next creation_index out of the input communicator and stashes it on the object. Since object >creation is collective this is guaranteed to agree on any given communicator across processes.
  5. When the Python garbage collector tries to destroy PETSc objects we resurrect the C object in finalisation and stash it in >"resurrected_objects" on the communicator.
  6. Periodically (as a result of user intervention in the first instance), we do garbage collection collectively on these resurrected >objects by performing a set intersection of the creation_indices across the communicator's processes, and then calling >XXXDestroy in order on the sorted_by_creation_index set intersection. I think that most of this infrastructure is agnostic of the managed language, so Jack was doing implementation in PETSc (rather than petsc4py). This wasn't a perfect solution (I recall that we could still cook up situations in which objects would not be collected), but it did >seem to (in theory) solve any potential deadlock issues. Lawrence

    Hi Everyone, I cannot fault Lawrence's explanation, that is precisely what I'm implementing. The only difference is I was adding most of the logic for the "resurrected objects map" to petsc4py rather than PETSc. Given that this solution is truly Python agnostic, I will move what I have written to C and merely add the interface to the functionality to petsc4py. Indeed, this works out better for me as I was not enjoying writing all the code in Cython! I'll post an update once there is a working prototype in my PETSc fork, and the code is ready for testing. Cheers, Jack

fverdugo commented 3 years ago

Hi @amartinhuertas

I have implemented a draft of another approach to handle petsc gc. See branch: https://github.com/gridap/GridapPETSc.jl/tree/petsc_gc

My principal goal is to fix the issue of petsc gc using the standard petsc destroy functions (e.g., VecDestroy, MatDestroy, etc).

I am not 100% confortable using more sophisticated petsc functions as we explored here https://github.com/gridap/GridapPETSc.jl/pull/42 In particular, with this error

[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Petsc has generated inconsistent data
[1]PETSC ERROR: No more room in array, limit 256 
 recompile src/sys/objects/destroy.c with larger value for MAXREGDESOBJS

So, I think it is perhaps a good idea to find a solution with the standard petsc destroy functions alone.

The main idea is to maintain a package global vector _REFS containing the low level handlers of types Ref{Vec}, Ref{Mat}, etc that are created in the computation. Then, we also consider another package global vector _STATES telling the state of these low level references. Three states are possible _INITIALIZED, _FINALIZED, and _ORPHAN. The idea is that we install a finalizer so that the Julia gc marks the corresponding low level refs as _ORPHAN. Then with a call to function petsc_cg() all the orphan references with be destroyed with VecDestroy etc collectively.

The draft is incomplete:

amartinhuertas commented 3 years ago

In particular, with this error

@fverdugo ... this error is easily by-passeable. Barry Smith told us the solution and that they would accept a PR in PETSc along the lines that he proposed. Anyway, I understand your point, it is much better if we are able to solve the problem using more standard functionality in PETSc. I still have to understand what you are proposing, whether it works, etc. The current solution works. (up to that nasty error that is not acceptable).