pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
520 stars 276 forks source link

hydra can no longer launch the Flux resource manager using PMI-1 wire protocol #7050

Open garlick opened 4 days ago

garlick commented 4 days ago

Problem: mpiexec.hydra v4.2.0 (and reportedly v4.1.1) can no longer launch Flux.

It looks like a PMI-1 wire protocol change has crept in to the hydra implementation around v4.1, where values are returned with found=TRUE appended to them? Our client checks the protocol version and it is still being returned as pmi_version=1 pmi_subversion=1.

Was that intentional?

TL;DR flux-framework/flux-core#6072

garlick commented 4 days ago

Our PMI-1 client assumes KVS values may contain any character except LF (including space and =), thus the value=whatever protocol element needs to be the end of the protocol line.

I tried to document the PMI-1 wire protocol here before implementing it, and at one time got a little feedback from Pavan on it. But I guess when you're documenting somebody else's work, you are on thin ice.

Any guidance that would help us tighten up our implementation or correct that spec would be helpful.

raffenet commented 4 days ago

I suspect the addition is unintentional, and the result of the refactoring of the PMI1/PMI2 protocol implementations into the libpmi convenience library. We will take a look.