lab-cosmo / i-pi-dev_archive

Development version of i-PI
21 stars 12 forks source link

Fix occasional hang in socket interface #98

Closed OndrejMarsalek closed 8 years ago

OndrejMarsalek commented 8 years ago

I have marked the select that can hang. We need a suitable timeout, though.

ceriottm commented 8 years ago

I tried to add a timeout and incidentally to remove some of the blank except clauses.

See if this really fixes the random hangs problem, and whether it creates new problems.

OndrejMarsalek commented 8 years ago

I was testing this in production for a while now with I think 10 s for the timeout and it seemed to work fine, but it is not clear to me if there is way to determine the timeout somehow robustly at runtime or whether we just guess a hopefully safe fixed number.

ceriottm commented 8 years ago

If I understand, this is more a problem that happens on the i-PI side - the server-side socket getting somewhat stuck. I think that this is unrelated to say NW outages or the like. Perhaps one could test that by running on a LAN and disconnecting and reconnecting the network cable, just to be sure we cover all scenarios. In the documentation of the socket module they suggest one minute timeout, as this hanging is a really rare event. Michele

On 14 December 2015 at 10:10, Ondrej Marsalek notifications@github.com wrote:

I was testing this in production for a while now with I think 10 s for the timeout and it seemed to work fine, but it is not clear to me if there is way to determine the timeout somehow robustly at runtime or whether we just guess a hopefully safe fixed number.

— Reply to this email directly or view it on GitHub https://github.com/ceriottm/i-pi-dev/pull/98#issuecomment-164384238.

OndrejMarsalek commented 8 years ago

I had no issues with a 10 s timeout, but considering that should just protect from a total block when clients disconnect (i.e. not very often), I am sure 60 s is OK. Shall we do that and merge, then?

ceriottm commented 8 years ago

Yes, please go ahead. This can probably be merged to master, and then master be merged into clean-up. M

On 9 January 2016 at 02:31, Ondrej Marsalek notifications@github.com wrote:

I had no issues with a 10 s timeout, but considering that should just protect from a total block when clients disconnect (i.e. not very often), I am sure 60 s is OK. Shall we do that and merge, then?

— Reply to this email directly or view it on GitHub https://github.com/ceriottm/i-pi-dev/pull/98#issuecomment-170176318.

OndrejMarsalek commented 8 years ago

Yeah, merging to clean-up will not be a problem now, as the changes will always be smaller. I will merge this soon.