openlvc / portico

Portico is an open source, cross-platform, fully supported HLA RTI implementation. Designed with modularity and flexibility in mind, Portico is a production-grade RTI for the Simulation and Training Community, so come say hi!
http://www.porticoproject.org
151 stars 81 forks source link

Configure Timeout #313

Open lzqlzqwhr opened 3 years ago

lzqlzqwhr commented 3 years ago

Hello. If we run 2 federates using Portico and kill one of them, then as I tested, after around 10 sec the other one will notice the death of the first one. Do you know whether there is a way for the first one to rejoin the federation? Or is it possible to configure the timeout to be longer?

timpokorny commented 3 years ago

Hi @lzqlzqwhr

This is the automatic failure detection kicking in. The RTI in the current versions is fully decetrnalized, so to advance the simulation in many circumstances it needs to automatically detect disconnections (usually from when a federate crashes and thus can't resign gracefully).

The first federate can rejoin the federation at any time. To the participants it will look like a "new" federate, but it can still re-join and continue to play. Late-joining should work fine. That timeout wouldn't solve the problem of a killed federate, because the revived version, even if it had the same name, wouldn't be recognized by Portico as the old federate, hence the reason it effectively is a new late joiner.

Does that make sense?

lzqlzqwhr commented 3 years ago

Thank you so much for your reply. We are actually trying to use CRIU to checkpoint and restore the federation, which means once crashed, we can restore a federate from where we checkpointed previously, instead of asking one federate to join the federation again. As I tested, this can work before other federates notice the death of the failed one. So, I guess we have to enlarge the time anyway. Can we change this line to change the timeout? https://github.com/openlvc/portico/blob/master/codebase/resources/jars/portico.jar/etc/jgroups-udp.xml#L37