Use of socket.setdefaulttimeout in import-level code

arisudesu commented 7 years ago

It seems to me that changing the global timeout is not a good idea. This can affect other code that uses networking. My question is, is it possible to replace the use of socket.setdefaulttimeout with the use of timeout param on urlopen? The parameter exists since python 2.6 / 3.0, and the JPype requires 2.6 anyway, so it is possible to use this parameter.

tuxdna commented 7 years ago

The timeout was added for urllib2

https://github.com/misja/python-boilerpipe/commit/85cbfbcf6366299da88035a3ee13d82749f9a827

There are two imports Request and urlopen which are used from that module. So, I think it should be alright to make this change.

tuxdna commented 7 years ago

Something like this should work

diff --git a/src/boilerpipe/extract/__init__.py b/src/boilerpipe/extract/__init__.py
index 2225b58..c8cfd54 100644
--- a/src/boilerpipe/extract/__init__.py
+++ b/src/boilerpipe/extract/__init__.py
@@ -7,13 +7,13 @@ import socket
 import chardet
 import threading

-socket.setdefaulttimeout(15)
 lock = threading.Lock()

 InputSource        = jpype.JClass('org.xml.sax.InputSource')
 StringReader       = jpype.JClass('java.io.StringReader')
 HTMLHighlighter    = jpype.JClass('de.l3s.boilerpipe.sax.HTMLHighlighter')
 BoilerpipeSAXInput = jpype.JClass('de.l3s.boilerpipe.sax.BoilerpipeSAXInput')
+DEFAULT_TIMEOUT = 15

 class Extractor(object):
     """
@@ -33,10 +33,10 @@ class Extractor(object):
     data      = None
     headers   = {'User-Agent': 'Mozilla/5.0'}

-    def __init__(self, extractor='DefaultExtractor', **kwargs):
+    def __init__(self, extractor='DefaultExtractor', timeout=DEFAULT_TIMEOUT, **kwargs):
         if 'url' in kwargs:
             request     = Request(kwargs['url'], headers=self.headers)
-            connection  = urlopen(request)
+            connection  = urlopen(request, timeout=timeout)
             self.data   = connection.read()
             encoding    = connection.headers['content-type'].lower().split('charset=')[-1]
             if encoding.lower() == 'text/html':

tuxdna commented 7 years ago

@arisudesu Please feel free to create a PR, and assign this issue to yourself.

arisudesu commented 7 years ago

That's what I was thinking about, exactly. Made the PR: #53

tuxdna commented 7 years ago

Closed via: https://github.com/misja/python-boilerpipe/pull/53

misja / python-boilerpipe

Use of socket.setdefaulttimeout in import-level code #52