linwoodc3 / gdeltPyR

Python based framework to retreive Global Database of Events, Language, and Tone (GDELT) version 1.0 and version 2.0 data.
https://linwoodc3.github.io/gdeltPyR/
GNU General Public License v3.0
203 stars 53 forks source link

BUG: Proxy issue when importing #51

Closed f0lie closed 6 years ago

f0lie commented 6 years ago

I get a proxy error when trying to import the module. This is problematic since you can't pass parameters when importing things (IIRC). Seems like this is the problem bit.

~/gdelt/venv/lib/python3.7/site-packages/gdelt/base.py in <module>()
     80         '/utils/' \
     81         'schema_csvs/cameoCodes.json'
---> 82     codes = json.loads((requests.get(a).content.decode('utf-8')))
     83 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/gdelt/venv/lib/python3.7/site-packages/gdelt/base.py in <module>()
     74     codes = pd.read_json(os.path.join(BASE_DIR, 'data', 'cameoCodes.json'),
---> 75                          dtype=dict(cameoCode='str', GoldsteinScale=np.float64))
     76     codes.set_index('cameoCode', drop=False, inplace=True)

~/gdelt/venv/lib/python3.7/site-packages/pandas/io/json/json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
    421 
--> 422     result = json_reader.read()
    423     if should_close:

~/gdelt/venv/lib/python3.7/site-packages/pandas/io/json/json.py in read(self)
    528         else:
--> 529             obj = self._get_object_parser(self.data)
    530         self.close()

~/gdelt/venv/lib/python3.7/site-packages/pandas/io/json/json.py in _get_object_parser(self, json)
    545         if typ == 'frame':
--> 546             obj = FrameParser(json, **kwargs).parse()
    547 

~/gdelt/venv/lib/python3.7/site-packages/pandas/io/json/json.py in parse(self)
    637         else:
--> 638             self._parse_no_numpy()
    639 

~/gdelt/venv/lib/python3.7/site-packages/pandas/io/json/json.py in _parse_no_numpy(self)
    852             self.obj = DataFrame(
--> 853                 loads(json, precise_float=self.precise_float), dtype=None)
    854         elif orient == "split":

ValueError: Expected object or value

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
~/gdelt/venv/lib/python3.7/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    593             if is_new_proxy_conn:
--> 594                 self._prepare_proxy(conn)
    595 

~/gdelt/venv/lib/python3.7/site-packages/urllib3/connectionpool.py in _prepare_proxy(self, conn)
    814 
--> 815         conn.connect()
    816 

~/gdelt/venv/lib/python3.7/site-packages/urllib3/connection.py in connect(self)
    323             # self._tunnel_host below.
--> 324             self._tunnel()
    325             # Mark this connection as not reusable

/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py in _tunnel(self)
    910             raise OSError("Tunnel connection failed: %d %s" % (code,
--> 911                                                                message.strip()))
    912         while True:

OSError: Tunnel connection failed: 407 AuthorizedOnly

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
~/gdelt/venv/lib/python3.7/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    444                     retries=self.max_retries,
--> 445                     timeout=timeout
    446                 )

~/gdelt/venv/lib/python3.7/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    637             retries = retries.increment(method, url, error=e, _pool=self,
--> 638                                         _stacktrace=sys.exc_info()[2])
    639             retries.sleep()

~/gdelt/venv/lib/python3.7/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    397         if new_retry.is_exhausted():
--> 398             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    399 

MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /linwoodc3/gdeltPyR/master/utils/schema_csvs/cameoCodes.json (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 407 AuthorizedOnly')))

During handling of the above exception, another exception occurred:

ProxyError                                Traceback (most recent call last)
<ipython-input-1-b6a720b4b38d> in <module>()
----> 1 import gdelt

~/gdelt/venv/lib/python3.7/site-packages/gdelt/__init__.py in <module>()
      4 from __future__ import absolute_import
      5 
----> 6 from gdelt.base import gdelt
      7 
      8 __name__ = 'gdelt'

~/gdelt/venv/lib/python3.7/site-packages/gdelt/base.py in <module>()
     80         '/utils/' \
     81         'schema_csvs/cameoCodes.json'
---> 82     codes = json.loads((requests.get(a).content.decode('utf-8')))
     83 
     84 ##############################

~/gdelt/venv/lib/python3.7/site-packages/requests/api.py in get(url, params, **kwargs)
     70 
     71     kwargs.setdefault('allow_redirects', True)
---> 72     return request('get', url, params=params, **kwargs)
     73 
     74 

~/gdelt/venv/lib/python3.7/site-packages/requests/api.py in request(method, url, **kwargs)
     56     # cases, and look like a memory leak in others.
     57     with sessions.Session() as session:
---> 58         return session.request(method=method, url=url, **kwargs)
     59 
     60 

~/gdelt/venv/lib/python3.7/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    510         }
    511         send_kwargs.update(settings)
--> 512         resp = self.send(prep, **send_kwargs)
    513 
    514         return resp

~/gdelt/venv/lib/python3.7/site-packages/requests/sessions.py in send(self, request, **kwargs)
    620 
    621         # Send the request
--> 622         r = adapter.send(request, **kwargs)
    623 
    624         # Total elapsed time of the request (approximately)

~/gdelt/venv/lib/python3.7/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    505 
    506             if isinstance(e.reason, _ProxyError):
--> 507                 raise ProxyError(e, request=request)
    508 
    509             if isinstance(e.reason, _SSLError):

ProxyError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /linwoodc3/gdeltPyR/master/utils/schema_csvs/cameoCodes.json (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 407 AuthorizedOnly')))
f0lie commented 6 years ago

I managed not get expectations by adding proxies to the request.get() function calls in parallel.py and base.py.

linwoodc3 commented 6 years ago

@adiep2501 , I've never experienced this. However, I understand some people would need a proxy if they were trying to surf anonymously or for some other reason. requests can set the proxy as you figured out.

Based on your error, your connection has a proxy server between it right?

You set this as a BUG so were you considering a pull request? Proxies are necessary if someone wants to surf anonymously or for other reasons (control access to outside internet, scan outgoing, etc.) Were you considering a pull request that has an optional proxy parameter?

I see how it can be done but didn't you make a fork. Just curious if you were planning on doing a PR.

f0lie commented 6 years ago

I was trying to use it from an corporate environment and they block connections if you don't use the network proxy.

On Sat, Aug 11, 2018, 10:32 AM Linwood Creekmore notifications@github.com wrote:

@adiep2501 https://github.com/adiep2501 , I've never experienced this. However, I understand some people would need a proxy if they were trying to surf anonymously or for some other reason. requests can set the proxy as you figured out http://docs.python-requests.org/en/master/user/advanced/#proxies.

Based on your error https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/407, your connection has a proxy server between it right?

You set this as a BUG so were you considering a pull request? Proxies are necessary if someone wants to surf anonymously or for other reasons (control access to outside internet, scan outgoing, etc.) https://www.quora.com/Why-would-you-need-a-proxy-server Were you considering a pull request that has an optional proxy parameter?

I see how it can be done but didn't you make a fork. Just curious if you were planning on doing a PR.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/linwoodc3/gdeltPyR/issues/51#issuecomment-412282560, or mute the thread https://github.com/notifications/unsubscribe-auth/AIq76lCs-MGbgTxGPhM3GT6zQigSgH2Wks5uPvkfgaJpZM4VyXzN .