teamonhand / robotframework-seleniumlibrary

Automatically exported from code.google.com/p/robotframework-seleniumlibrary
Apache License 2.0
0 stars 0 forks source link

Selenium Python API leaks sockets on Jython because connections to SeleniumServer are not closed explicitly #79

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I'm not sure if this is an issue of the selenium
library or selenium itself. Could even be Jython.

We have a robot test suite which has grown a lot. somewhere in the middle
of the run I get a javs.net.SocketException because no connections are
available any more. I used Process Explorer to trace the problem and I
found that the Jython process opens thousands of connections to port 4444
(Selenium Server). Many of them are in CLOSE_WAIT state, maybe 50 are
ESTABLISHED. Our suite is set up to open the the Browser only once in the
overall suite setup, and it will close it in the  suite teardown.

So why is the robot creating so many connections to port 4444?

We are using the latest release of robot framework (2.1.2), robot selenium
library (2.2.2), jython (2.5.1)

(this was posted previously in
http://code.google.com/p/robotframework/issues/detail?id=462 )

Original issue reported on code.google.com by kai.hack...@gmail.com on 27 Jan 2010 at 12:58

GoogleCodeExporter commented 8 years ago
Here's a screenshot of how it starts to look after just 10 test cases. The left
window shows the Jython process running Jybot, the right one is the 
Selenium-Server.
From my point of view the connections to the selenium server are not reused nor
properly closed for garbage collection.

Original comment by kai.hack...@gmail.com on 1 Feb 2010 at 3:18

Attachments:

GoogleCodeExporter commented 8 years ago
Have you tested running same tests with pybot?

Original comment by pekka.klarck on 1 Feb 2010 at 8:01

GoogleCodeExporter commented 8 years ago
That's technically difficult, as we wrote java extension libraries for some 
steps in
the middle. We rely on jybot.

Original comment by kai.hack...@gmail.com on 1 Feb 2010 at 11:19

GoogleCodeExporter commented 8 years ago
Could you run at least some of the tests, or parts of them, with pybot? 
Unfortunately
we don't have much time to investigate this at the moment, so the more 
information
you could provide the better. 

Original comment by pekka.klarck on 1 Feb 2010 at 6:45

GoogleCodeExporter commented 8 years ago
I did a test run on pybot now (limited as I had to remove some keywords), and it
looks ok there, so the problem is somewhere between jybot and selenium server.

Original comment by kai.hack...@gmail.com on 2 Feb 2010 at 12:59

GoogleCodeExporter commented 8 years ago
I could fix that issue for jybot by modifying the file *selenium.py*. in 
*do_command*
there was  a new HTTP connection opened for each Selenium command. I made it to 
a
class member instead and it would only created conditionally on the first 
Selenium
server call.

before:

        conn = httplib.HTTPConnection(self.host, self.port)
        body = u'cmd=' + urllib.quote_plus(unicode(verb).encode('utf-8'))
        for i in range(len(args)):
            body += '&' + unicode(i+1) + '=' +
urllib.quote_plus(unicode(args[i]).encode('utf-8'))
        if (None != self.sessionId):
            body += "&sessionId=" + unicode(self.sessionId)
        headers = {"Content-Type": "application/x-www-form-urlencoded; charset=utf-8"}
        conn.request("POST", "/selenium-server/driver/", body, headers)

        response = conn.getresponse()

after:

        if(None == self.conn):
          self.conn = httplib.HTTPConnection(self.host, self.port)
        body = u'cmd=' + urllib.quote_plus(unicode(verb).encode('utf-8'))
        for i in range(len(args)):
            body += '&' + unicode(i+1) + '=' +
urllib.quote_plus(unicode(args[i]).encode('utf-8'))
        if (None != self.sessionId):
            body += "&sessionId=" + unicode(self.sessionId)
        headers = {"Content-Type": "application/x-www-form-urlencoded; charset=utf-8"}
        self.conn.request("POST", "/selenium-server/driver/", body, headers)

        response = self.conn.getresponse()

and in the __init__ method I added:

        self.conn = None

Note: there is still no code that would close the connection, but as the 
library gets
usually loaded only once between start and termination this would not mean a 
leak any
more.

As I don't have source code access maybe someone else patches it for the public.

Original comment by kai.hack...@gmail.com on 2 Feb 2010 at 2:08

GoogleCodeExporter commented 8 years ago
My colleague mentioned a patch file would be more helpful. Here it is.

Original comment by kai.hack...@gmail.com on 2 Feb 2010 at 3:21

Attachments:

GoogleCodeExporter commented 8 years ago
Awesome that you were able to find the root cause and even patch it! Notice that
selenium.py is the Selenium Python API provided by the Selenium project. Since 
we
have a version of it bundled to the SeleniumLibrary we can easily include fixes 
to it
into our releases, but we should also send the patch upstream. Do you have time 
and
energy for that?

Janne, could you also check is our version of the selenium.py the latest one?

Original comment by pekka.klarck on 2 Feb 2010 at 9:05

GoogleCodeExporter commented 8 years ago
I would do so but I'm still testing my patched version. I found one case where 
my
patch renders the SeleniumLibrary useless when the http request failed for any 
weird
reason. A bad response state renders the connection useless. So I need to catch 
that
exception and set the connection to null. I will send a second version of that 
patch
soon. After that, if you provide me a link to the right place I can forward that
patch to the Selenium project as well. 
After all I am a Java developer and my work needs to be reviewed carefully.

Cheers,
Kai

Original comment by kai.hack...@gmail.com on 2 Feb 2010 at 9:54

GoogleCodeExporter commented 8 years ago
Selenium issue tracker seems to be located at http://jira.openqa.org/browse/SEL

Original comment by pekka.klarck on 2 Feb 2010 at 10:53

GoogleCodeExporter commented 8 years ago
patch that catches http exception and closes/disposes the connection then

Original comment by kai.hack...@gmail.com on 3 Feb 2010 at 2:24

Attachments:

GoogleCodeExporter commented 8 years ago
Can you try whether just closing the connection in the end of do_command would 
be
enough. This would simplify the patch considerably. Applying the current patch
directly requires changing also the library to get Start Selenium Server working
properly, since we don't currently handle HttpException. 

Original comment by janne.t....@gmail.com on 3 Feb 2010 at 9:33

GoogleCodeExporter commented 8 years ago
I've tested that before, and that would fix it, too. Anyway, the thrown 
exception is
just exposed here, it would have been thrown from the httplib under different
conditions, too. 

Original comment by kai.hack...@gmail.com on 3 Feb 2010 at 10:08

GoogleCodeExporter commented 8 years ago
The underlying problem here is most likely caused by `do_command` relying on 
Python
garbage collector collecting `conn` as soon as `do_command` is exited. I assume 
that
Python socket's are implemented so that they are closed automatically when they 
are
collected. Because Python uses reference counting garbage collector, it 
immediately
detects when `conn` is not used anymore and thus the code works even though the
connection is never explicitly closed. 

The problem appears on Jython because there garbage is collected by the JVM 
which
uses different algorithm than reference counting. That obviously leaves 
connections
open, and when enough of them are opened they run out.

The easiest and safest solution is just adding `conn.close()` to `do_command`.
Reusing the socket is at least theoretically a better (=faster) solution because
there's no need to open sockets all the time. That is, however, also more 
risky. As
Janne wrote, `Start Selenium Server` stopped working correctly with this patch. 
It
currently only catches `socket.error` and after the patch was applied there was 
also
`HttpException`. We aren't entirely sure why that happens, which makes 
accepting the
patch in this form a bit risky. We definitely don't want this to break existing 
tests. 

(Notice also that to expose the original error, you should use just `raise` and 
not
`raise Exception, self.conn`, especially when you've just executed `self.conn = 
None`.)

Original comment by pekka.klarck on 3 Feb 2010 at 12:05

GoogleCodeExporter commented 8 years ago
What exactly goes wrong with 'Start Selenium Server'? In our test suite it 
behaves
well with the patch. (Or maybe this is again a Python/Jython issue?) 

Of course you are right with the raise, that was silly from me :D I had 
difficulties
to test failings. and I'm not absolutely firm with the syntax yet (I have about 
4h
experience with python yet)

Original comment by kai.hack...@gmail.com on 3 Feb 2010 at 8:42

GoogleCodeExporter commented 8 years ago
Ooops, the problem isn't in start_selenium_server but in open_browser and, more
precisely, in _connect_to_selenium_server. There's a loop that tries to connect 
to
the server for 10 seconds to handle the situation that the server is slow to 
start.
The actual connection is done inside `except socker.error` and with the patch 
the
executed code also raised HttpException.

We noticed the problem when running the acceptance tests of the SeleniumLibrary 
and
cannot comment on how often that happens in "real life". For example in your 
case the
server might just start so fast that you don't see the problem at all. Anyway, 
the
change feels a big risky without further investigation. 

Do test run faster when you run them with the version where sockets are reused? 
If
there's no considerable difference, I'd be very happy with simply adding
`conn.close()`. That kind of a change ought to be easy to push upstream to the
Selenium project also.

Btw, if you are interested in learning more Python I highly recommend Dive Into
Python book which is freely available at http://diveintopython.org. It's 
targeted for
people with previous programming experience -- the first example isn't even 
close to
`print "Hello, world!"`. There are some other good references at
http://code.google.com/p/robotframework/wiki/PythonTutorial

Original comment by pekka.klarck on 3 Feb 2010 at 10:53

GoogleCodeExporter commented 8 years ago
Thanks for the book hint, I will for sure dive into it :D
From my subjective point the selenium communication is indeed a bit faster. Not 
much.
I can live with a local conn object and local closing it.

So I'll send a new patch in for this for review.

Original comment by kai.hack...@gmail.com on 4 Feb 2010 at 1:31

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks :) I was just at the same point in the code and wondered if a 
conn.close()
would be enough. Thanks for providing the fix. Can we have an official new
SeleniumLibrary? IMHO it's worth it for this fix alone, as any serious usage is
impossible without it.

Original comment by Andreas.EbbertKarroum@gmail.com on 4 Feb 2010 at 6:35

GoogleCodeExporter commented 8 years ago
SeleniumLibrary 2.2.3 will be released next week and will include the modified 
patch to 
the remote control (selenium.py).

Original comment by janne.t....@gmail.com on 4 Feb 2010 at 6:46

GoogleCodeExporter commented 8 years ago
Kai, would you be interested in submitting a bug report about this into Selenium
tracker too? Feel free to reference this issue.

Original comment by pekka.klarck on 4 Feb 2010 at 9:31

GoogleCodeExporter commented 8 years ago
Already done, http://jira.openqa.org/browse/SEL-733 

Original comment by kai.hack...@gmail.com on 4 Feb 2010 at 10:22

GoogleCodeExporter commented 8 years ago

Original comment by janne.t....@gmail.com on 8 Feb 2010 at 5:11

GoogleCodeExporter commented 8 years ago

Original comment by janne.t....@gmail.com on 8 Feb 2010 at 5:12

GoogleCodeExporter commented 8 years ago
The patch, slightly modified, is now applied in the trunk.

I also made a crude test (by grepping the selenium.py), which should catch if we
update the remote control without applying the patch.

Thanks Kai for your work on resolving the issue.

Original comment by janne.t....@gmail.com on 8 Feb 2010 at 12:42

GoogleCodeExporter commented 8 years ago
r295 contains the patched selenium.py (And due to my misuse of svn, some other
changes too)

Original comment by janne.t....@gmail.com on 9 Feb 2010 at 5:54

GoogleCodeExporter commented 8 years ago
FYI, I created an issue for the selenium library itself:
http://jira.openqa.org/browse/SRC-817

Original comment by Andreas.EbbertKarroum@gmail.com on 10 Feb 2010 at 8:40

GoogleCodeExporter commented 8 years ago
The underlying bug was already reported as 
http://jira.openqa.org/browse/SEL-733 
 but http://jira.openqa.org/browse/SRC-817 is somewhat more describing and also
contains fix. Could someone add a note that these issues are duplicates to the
Selenium tracker?

Original comment by pekka.klarck on 10 Feb 2010 at 1:45

GoogleCodeExporter commented 8 years ago
This leak has now been fixed in the official selenium.py version as discussed 
here: 
http://groups.google.com/group/robotframework-devel/browse_thread/thread/91df60b
5c26b1189

I submitted issue 131 about taking the latest official version into use.

Original comment by pekka.klarck on 5 Aug 2010 at 10:16