distributed nodes that don't see each other

Licenser commented 12 years ago

I noticed that I keep running into problems with distributed gproc from the moment where I fire up two nodes that don't see each other when gproc starts but then are supposed to join together with discovery set to all. I kind of have the feeling it's a known issue but I figured it won't hurt to document.

Example timeline: n1 - boot n1 - start gproc n2 - boot n2 - start gproc n1 - net_adm:ping(n2) -> not joining together propperly.

That kind of is a netsplit issue and it propably can't be resolved in a entirely way since it's not guaranteed that conflicts in the two registreis can be joined automatically but what really would be cool if there would be some kind of callback saying: Hey we've a (re)join from a split with side 1 and 2 so it'd be possible to work out the stuff if possible.

uwiger commented 12 years ago

This is a problem with gen_leader and, consequently, with gproc.

I believe some of the gen_leaders out there, not least vagabond and garrett-smith and can handle netsplits fairly well, but you'd need to check with them directly for details. Gproc would need some callback in order to resynch, and some changes to the internal data structures to be able to know what to do in case of conflicts.

uwiger commented 12 years ago

I have added gproc:bcast() and :wide_await() to make it a bit easier to have multiple instances of local gproc services cooperating in a loose way. There may be other similar functions that could be added. Beyond that, I have no immediate plans to work on the global gproc part right now (not from lack of interest - I simply don't have the time).

Licenser commented 12 years ago

Thanks mate, that are great improvements, I'll see if I can work with them to make my service more stable :) and no worries I know the time problem, sadly I don't think I am yet up to helping out here :(

uwiger / gproc

distributed nodes that don't see each other #19