sonyxperiadev / gerrit-events

MIT License
47 stars 62 forks source link

Using the default value of timeout=0 when creating SSH connections might lead to threads stuck forever #97

Open romanek-adam opened 4 years ago

romanek-adam commented 4 years ago

Hi, this is a follow-up of my investigation related to an incident me and my team was hit yesterday after our Gerrit instance outage.

I described the whole story in Jenkins' bug tracker as it was Jenkins' Gerrit Trigger plugin which was affected by the issue. See JENKINS-33959 and my comment specifically.

I believe the root cause of the above issue comes from this library.

Long story short, when the timeout parameter is not specified the library uses the default value of 0 when creating SSH connections. This blocked the calling thread, potentially forever (it's still blocked 1 day after we noticed it).

As I mentioned in the Jenkins ticket:

As a general thought setting timeouts to 0 is almost always a bad practice, unless you really want to wait forever. In practical applications it's usually better to let it timeout and simply retry.

ModeSevenIndustrialSolutions commented 1 year ago

Okay, I'm a bit surprised to see no additional comments on this report? We may be seeing problems with queues this in our production environment due to this!

rsandell commented 1 year ago

Yes, the default could be changed for a quick fix. But the trigger should probably be updated with a configurable connection timeout.