jorgensd / adios4dolfinx

Extending DOLFINx with checkpointing functionality
http://jsdokken.com/adios4dolfinx/
MIT License
20 stars 7 forks source link

Is v.0.4.0 not compatible with dolfinx v.0.6.1? #28

Closed robert-30 closed 1 year ago

robert-30 commented 1 year ago

In the README it says that v.0.1.0 of adios4dolfinx is compatible with dolfinx 0.6.1. Are the newer versions not compatible?

jorgensd commented 1 year ago

The newer versions rely on some logic for permuting dofmaps that was added to the dolfinx Python layer post 0.6.1 release.

robert-30 commented 1 year ago

I see. Somewhat related to this: when I use v.0.1.0 with dolfinx 0.6, adios4dolfinx crashes if I read in many functions in a row, giving some MPI error. There were some opened file streams (which I see you have patched with the later versions of adios4dolfinx), but even after closing these the issue persists. Do you know what could be causing this?

jorgensd commented 1 year ago

I see. Somewhat related to this: when I use v.0.1.0 with dolfinx 0.6, adios4dolfinx crashes if I read in many functions in a row, giving some MPI error. There were some opened file streams (which I see you have patched with the later versions of adios4dolfinx), but even after closing these the issue persists. Do you know what could be causing this?

It seems like ADIOS doesn't free its communicators properly (at initialization of ADIOS, the mpi communicator is duplicated). I've not found a nice way to work around this, and usually experience an issue around 1000-2000 calls of the checkpoint functionality.

I would have to redesign the code to only initialize adios once to avoid this.

robert-30 commented 1 year ago

Okay, thank you very much!

jorgensd commented 1 year ago

I might have found one issue. I do not explicitly call MPI_Comm.Free(). I'll try to add that and see if it helps

jorgensd commented 1 year ago

I've been able to improve the following:

import dolfinx
import adios4dolfinx
from mpi4py import MPI

for i in range(10000):
    print(i)
    mesh = dolfinx.mesh.create_unit_square(MPI.COMM_WORLD, 10, 10)
    V = dolfinx.fem.functionspace(mesh, ("Lagrange", 1))
    u = dolfinx.fem.Function(V)

    adios4dolfinx.write_mesh(mesh, "u.bp", engine="BP4")
    adios4dolfinx.write_function(u, "u.bp", engine="BP4")

    new_mesh = adios4dolfinx.read_mesh(MPI.COMM_WORLD, "u.bp", engine="BP4", ghost_mode=dolfinx.mesh.GhostMode.shared_facet)
    V_new = dolfinx.fem.functionspace(new_mesh, ("Lagrange", 1))
    u_new = dolfinx.fem.Function(V_new)
    adios4dolfinx.read_function(u_new, "u.bp", engine="BP4")
    del u, u_new, mesh, new_mesh

to run 337 times to 1012 times with: https://github.com/jorgensd/adios4dolfinx/pull/33

jorgensd commented 1 year ago

Got up to 2024 with latest commit.

jorgensd commented 1 year ago

With the latest fixes, I've been able to run the code for 10 000 iterations, so I believe the comm duplication issue is resolved once #33 is merged

jorgensd commented 1 year ago

I've now fixed all mpi duplication issues, it is all added in v0.6.0 https://github.com/jorgensd/adios4dolfinx/releases/tag/v0.6.0