guardicore / IPCDump

BSD 3-Clause "New" or "Revised" License
245 stars 32 forks source link

Debugging MPI programs #4

Open renatobellotti opened 3 years ago

renatobellotti commented 3 years ago

Are there plans to process the raw communication data and provide a high-level event history for MPI communication? I think the scientific open source community is in desperate need of a tool to debug distributed codes.

liad-guardicore commented 3 years ago

Hey @renatobellotti can you please share some examples where you think IPCDump could help? How MPI Application usually communicate?

renatobellotti commented 3 years ago

Thanks for your answer.

Well, the problem is that I have no clue how MPI works internally. It is more a standard for a communication API, in the style of "send-array-to-process-a", "wait-for-array-from-process-b" and other stuff in that direction. I guess the implementation can even use different kinds of mechanism depending on which cores/nodes the processes run, but I'm just a user, I don't know what happens behind the scenes. I was hoping that you knew about MPI and could add something like a filter to group messages semantically. :)

liad-guardicore commented 3 years ago

As far as I know about MPI (which is not a lot) the mechanism on which it uses to pass information differs between implementations, but I do think it could be a nice feature to have a wrapper that knows how to identify MPI communication and present them in a nicer way than just random IPC events. (Not sure this is possible, but will be nice to check if it is and if it is, to implement it)

renatobellotti commented 3 years ago

I'm very happy to hear that, I think the scientific community would love this feature!