robotframework / OldSeleniumLibrary

Deprecated Selenium library for Robot Framework
Apache License 2.0
13 stars 3 forks source link

Selenium Python API leaks sockets on Jython because connections to SeleniumServer are not closed explicitly #79

Closed spooning closed 9 years ago

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 26 Jan 2010

I'm not sure if this is an issue of the selenium library or selenium itself. Could even be Jython.

We have a robot test suite which has grown a lot. somewhere in the middle of the run I get a javs.net.SocketException because no connections are available any more. I used Process Explorer to trace the problem and I found that the Jython process opens thousands of connections to port 4444 (Selenium Server). Many of them are in CLOSE_WAIT state, maybe 50 are ESTABLISHED. Our suite is set up to open the the Browser only once in the overall suite setup, and it will close it in the suite teardown.

So why is the robot creating so many connections to port 4444?

We are using the latest release of robot framework (2.1.2), robot selenium library (2.2.2), jython (2.5.1)

(this was posted previously in https://code.google.com/p/robotframework/issues/detail?id=462 )

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 31 Jan 2010

Here's a screenshot of how it starts to look after just 10 test cases. The left window shows the Jython process running Jybot, the right one is the Selenium-Server. From my point of view the connections to the selenium server are not reused nor properly closed for garbage collection.

spooning commented 9 years ago

Originally submitted to Google Code by @pekkaklarck on 1 Feb 2010

Have you tested running same tests with pybot?

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 1 Feb 2010

That's technically difficult, as we wrote java extension libraries for some steps in the middle. We rely on jybot.

spooning commented 9 years ago

Originally submitted to Google Code by @pekkaklarck on 1 Feb 2010

Could you run at least some of the tests, or parts of them, with pybot? Unfortunately we don't have much time to investigate this at the moment, so the more information you could provide the better.

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 1 Feb 2010

I did a test run on pybot now (limited as I had to remove some keywords), and it looks ok there, so the problem is somewhere between jybot and selenium server.

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 1 Feb 2010

I could fix that issue for jybot by modifying the file selenium.py. in _docommand there was a new HTTP connection opened for each Selenium command. I made it to a class member instead and it would only created conditionally on the first Selenium server call.

before:

    conn = httplib.HTTPConnection(self.host, self.port)
    body = u'cmd=' + urllib.quote_plus(unicode(verb).encode('utf-8'))
    for i in range(len(args)):
        body += '&' + unicode(i+1) + '=' +

urllib.quote_plus(unicode(args[i]).encode('utf-8')) if (None != self.sessionId): body += "&sessionId=" + unicode(self.sessionId) headers = {"Content-Type": "application/x-www-form-urlencoded; charset=utf-8"} conn.request("POST", "/selenium-server/driver/", body, headers)

    response = conn.getresponse()

after:

    if(None == self.conn):
      self.conn = httplib.HTTPConnection(self.host, self.port)
    body = u'cmd=' + urllib.quote_plus(unicode(verb).encode('utf-8'))
    for i in range(len(args)):
        body += '&' + unicode(i+1) + '=' +

urllib.quote_plus(unicode(args[i]).encode('utf-8')) if (None != self.sessionId): body += "&sessionId=" + unicode(self.sessionId) headers = {"Content-Type": "application/x-www-form-urlencoded; charset=utf-8"} self.conn.request("POST", "/selenium-server/driver/", body, headers)

    response = self.conn.getresponse()

and in the init method I added:

    self.conn = None

Note: there is still no code that would close the connection, but as the library gets usually loaded only once between start and termination this would not mean a leak any more.

As I don't have source code access maybe someone else patches it for the public.

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 1 Feb 2010

My colleague mentioned a patch file would be more helpful. Here it is.

spooning commented 9 years ago

Originally submitted to Google Code by @pekkaklarck on 2 Feb 2010

Awesome that you were able to find the root cause and even patch it! Notice that selenium.py is the Selenium Python API provided by the Selenium project. Since we have a version of it bundled to the SeleniumLibrary we can easily include fixes to it into our releases, but we should also send the patch upstream. Do you have time and energy for that?

Janne, could you also check is our version of the selenium.py the latest one?

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 2 Feb 2010

I would do so but I'm still testing my patched version. I found one case where my patch renders the SeleniumLibrary useless when the http request failed for any weird reason. A bad response state renders the connection useless. So I need to catch that exception and set the connection to null. I will send a second version of that patch soon. After that, if you provide me a link to the right place I can forward that patch to the Selenium project as well. After all I am a Java developer and my work needs to be reviewed carefully.

Cheers, Kai

spooning commented 9 years ago

Originally submitted to Google Code by @pekkaklarck on 2 Feb 2010

Selenium issue tracker seems to be located at http://jira.openqa.org/browse/SEL

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 2 Feb 2010

patch that catches http exception and closes/disposes the connection then

spooning commented 9 years ago

Originally submitted to Google Code by @yanne on 3 Feb 2010

Can you try whether just closing the connection in the end of do_command would be enough. This would simplify the patch considerably. Applying the current patch directly requires changing also the library to get Start Selenium Server working properly, since we don't currently handle HttpException.

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 3 Feb 2010

I've tested that before, and that would fix it, too. Anyway, the thrown exception is just exposed here, it would have been thrown from the httplib under different conditions, too.

spooning commented 9 years ago

Originally submitted to Google Code by @pekkaklarck on 3 Feb 2010

The underlying problem here is most likely caused by do_command relying on Python garbage collector collecting conn as soon as do_command is exited. I assume that Python socket's are implemented so that they are closed automatically when they are collected. Because Python uses reference counting garbage collector, it immediately detects when conn is not used anymore and thus the code works even though the connection is never explicitly closed.

The problem appears on Jython because there garbage is collected by the JVM which uses different algorithm than reference counting. That obviously leaves connections open, and when enough of them are opened they run out.

The easiest and safest solution is just adding conn.close() to do_command. Reusing the socket is at least theoretically a better (=faster) solution because there's no need to open sockets all the time. That is, however, also more risky. As Janne wrote, Start Selenium Server stopped working correctly with this patch. It currently only catches socket.error and after the patch was applied there was also HttpException. We aren't entirely sure why that happens, which makes accepting the patch in this form a bit risky. We definitely don't want this to break existing tests.

(Notice also that to expose the original error, you should use just raise and not raise Exception, self.conn, especially when you've just executed self.conn = None.)

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 3 Feb 2010

What exactly goes wrong with 'Start Selenium Server'? In our test suite it behaves well with the patch. (Or maybe this is again a Python/Jython issue?)

Of course you are right with the raise, that was silly from me :D I had difficulties to test failings. and I'm not absolutely firm with the syntax yet (I have about 4h experience with python yet)

spooning commented 9 years ago

Originally submitted to Google Code by @pekkaklarck on 3 Feb 2010

Ooops, the problem isn't in start_selenium_server but in open_browser and, more precisely, in _connect_to_selenium_server. There's a loop that tries to connect to the server for 10 seconds to handle the situation that the server is slow to start. The actual connection is done inside except socker.error and with the patch the executed code also raised HttpException.

We noticed the problem when running the acceptance tests of the SeleniumLibrary and cannot comment on how often that happens in "real life". For example in your case the server might just start so fast that you don't see the problem at all. Anyway, the change feels a big risky without further investigation.

Do test run faster when you run them with the version where sockets are reused? If there's no considerable difference, I'd be very happy with simply adding conn.close(). That kind of a change ought to be easy to push upstream to the Selenium project also.

Btw, if you are interested in learning more Python I highly recommend Dive Into Python book which is freely available at http://diveintopython.org. It's targeted for people with previous programming experience -- the first example isn't even close to print "Hello, world!". There are some other good references at https://code.google.com/p/robotframework/wiki/PythonTutorial

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 3 Feb 2010

Thanks for the book hint, I will for sure dive into it :D From my subjective point the selenium communication is indeed a bit faster. Not much. I can live with a local conn object and local closing it.

So I'll send a new patch in for this for review.

spooning commented 9 years ago

Originally submitted to Google Code by Andreas.EbbertKarroum on 4 Feb 2010

Thanks :) I was just at the same point in the code and wondered if a conn.close() would be enough. Thanks for providing the fix. Can we have an official new SeleniumLibrary? IMHO it's worth it for this fix alone, as any serious usage is impossible without it.

spooning commented 9 years ago

Originally submitted to Google Code by @yanne on 4 Feb 2010

SeleniumLibrary 2.2.3 will be released next week and will include the modified patch to the remote control (selenium.py).

spooning commented 9 years ago

Originally submitted to Google Code by @pekkaklarck on 4 Feb 2010

Kai, would you be interested in submitting a bug report about this into Selenium tracker too? Feel free to reference this issue.

spooning commented 9 years ago

Originally submitted to Google Code by kai.hackemesser on 4 Feb 2010

Already done, http://jira.openqa.org/browse/SEL-733

spooning commented 9 years ago

Originally submitted to Google Code by @yanne on 8 Feb 2010

The patch, slightly modified, is now applied in the trunk.

I also made a crude test (by grepping the selenium.py), which should catch if we update the remote control without applying the patch.

Thanks Kai for your work on resolving the issue.

spooning commented 9 years ago

Originally submitted to Google Code by @yanne on 8 Feb 2010

r295 contains the patched selenium.py (And due to my misuse of svn, some other changes too)

spooning commented 9 years ago

Originally submitted to Google Code by Andreas.EbbertKarroum on 10 Feb 2010

FYI, I created an issue for the selenium library itself: http://jira.openqa.org/browse/SRC-817

spooning commented 9 years ago

Originally submitted to Google Code by @pekkaklarck on 10 Feb 2010

The underlying bug was already reported as http://jira.openqa.org/browse/SEL-733 but http://jira.openqa.org/browse/SRC-817 is somewhat more describing and also contains fix. Could someone add a note that these issues are duplicates to the Selenium tracker?

spooning commented 9 years ago

Originally submitted to Google Code by @pekkaklarck on 5 Aug 2010

This leak has now been fixed in the official selenium.py version as discussed here: http://groups.google.com/group/robotframework-devel/browse_thread/thread/91df60b5c26b1189

I submitted issue 131 about taking the latest official version into use.