robotframework / robotframework

Generic automation framework for acceptance testing and RPA
http://robotframework.org
Apache License 2.0
9.84k stars 2.33k forks source link

Remote: Support for timeouts to avoid hanging connections #1799

Closed JuyoungAn closed 10 years ago

JuyoungAn commented 10 years ago

Version: RF 2.8.4 OS: All Windows

Remote Library does not detect BSOD on PC running remote server . It's still connecting to remote library and test is pending forever. It causes remote connection fail but not always. I wish remote library has detection of server status.

pekkaklarck commented 10 years ago

It's true that Remote currently has no timeout. I briefly looked and it seems xmlrpclib.ServerProxy that Remote uses doesn't directly support giving a timeout, but configuring it via xmlrpclib.Transport is not too complicated: http://stackoverflow.com/questions/2425799/timeout-for-xmlrpclib-client-requests

The above resource explains how to add support for timeouts in Python 2.7 and also 2.6. Should also test it with Python 2.5 (as long as we support it) as well as with Jython and IronPython.

Adding automatically some timeout to Remote would be a backwards incompatibility change and cannot be done. We could, however, add new timeout=None argument to Remote that can be used to configure it.

I don't have time to look at this more closely in foreseeable future. If someone wants to see this implemented, providing a pull request or sponsoring the development somehow are good options.

pekkaklarck commented 10 years ago

I noticed we already have a test for Remote handling lost connection. It passes without problems and I couldn't get it hanging even if I tried to change terminating the server different ways.

To be able to do anything to this issue, we need to have a simple example we can use to reproduce this problem. Example involving BSOD doesn't sound nice and could be hard to create, so it would be great if you could create an example that terminates your remote server otherwise.

If you cannot create a simple example, we would at least need to know more about your setup to be able to even guess why communication hangs in your case. We would need to know what remote server you use, do you use Python or Jython, which versions, etc.

JuyoungAn commented 10 years ago

Hi, pekkaklarck I'll upload reproduce steps and version of remoteserver soon.

JuyoungAn commented 10 years ago

[Version] RF 2.8.5 robotremoteserver.py 1.0.1 Python 2.7.6(x86)

[Repro] * Settings * Library Remote http://xxx.xxx.xxx.xxx:8270

* Test Cases * test1 ${a} ${b}= Remote.Run And Return Rc And Output C:\NotMyFault\x86\NotMyfault.exe /crash

You can download NotMyfault tool that makes BSOD. http://blogs.technet.com/b/markrussinovich/archive/2011/01/11/3379158.aspx

pekkaklarck commented 10 years ago

Thanks for the example! NotMyFault looks like a very useful utility. =)

I don't have time to test this right now but I target this for RF 2.8.6. If we can reproduce the problem we can then test does setting timeout fix it.

pekkaklarck commented 10 years ago

I was able to reproduce hanging. The reason for it is that when the other end of the connection dies, the connection goes to TIME_WAIT state and it takes several minutes (depending on the OS default values and possible configuration) for it to actually timeout.

I also tested the solution to give a custom timeout using the approach explained in StackOverflow and it did work as promised. A very nice side-effect is that when the timeout is used, Remote connection will timeout faster if the remote server isn't available at all. This behavior also makes it easier to create a test for the timeout functionality, I don't want our acceptance tests to require BSOD:ing Windows machines.

I will implement this so that Remote gets new timeout argument that can be given use a custom timeout. I'll leave the default value to None (i.e. no custom timeout) to avoid backwards-incompatible changes.

JuyoungAn commented 10 years ago

Timeout will be very helpful. Thanks Pekkaklarck ..