pc2 / MPITape.jl

Record MPI operations on tape
MIT License
21 stars 3 forks source link

`plot_*_merged` functions don't work in a fresh session #17

Closed carstenbauer closed 1 year ago

carstenbauer commented 1 year ago

If I run the basic example (in example/) the plotting functions work, i.e. they print to the REPL and create a file gantt.png. However, if I close the REPL and and then load and merge the tape files in a fresh REPL, they don't. Specifically, if I run

using MPITape
tape_merged = MPITape.merge()
MPITape.plot_merged(tape_merged)

I get the following error:

julia> MPITape.plot_merged(tape_merged)
ERROR: Not all destinations found for MPITape.MPIEvent(1, MPI_Send, ..., 0.00014693100000329196, 0.00015528099999784217): [0]
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] get_edges(tape::Vector{MPITape.MPIEvent}; check::Bool)
   @ MPITape /scratch/pc2-mitarbeiter/bauerc/devel/MPITape.jl/src/communication_graph.jl:92
 [3] get_edges
   @ /scratch/pc2-mitarbeiter/bauerc/devel/MPITape.jl/src/communication_graph.jl:55 [inlined]
 [4] plot_merged(tape::Vector{MPITape.MPIEvent}; palette::PlotUtils.ColorPalette, fname::String)
   @ MPITape /scratch/pc2-mitarbeiter/bauerc/devel/MPITape.jl/src/plotting.jl:34
 [5] plot_merged(tape::Vector{MPITape.MPIEvent})
   @ MPITape /scratch/pc2-mitarbeiter/bauerc/devel/MPITape.jl/src/plotting.jl:23
 [6] top-level scope
   @ REPL[6]:1

Note that plot_sequence_merged fails with the same error (since they share get_edges).

(cc @Mellich)

Mellich commented 1 year ago

This seems to be a bug in the merge() function. merge() calls save(), so if you call it from a fresh session it will overwrite the tape of the rank with an empty tape. If you then load all tapes and merge them, the get_edges() function can not match every MPI call because one tape got overwritten.

So maybe we have to re-think our merge() behavior. Directly calling readall_and_merge() works for me.