Open jeff1evesque opened 6 years ago
Before we send our data payload to our endpoint, we need to restructure the data into an acceptable form. Therefore, each dataset instance, will need to be converted similar to below:
{
"properties": {
"session_name": "sample_svm_title",
"collection": "svm-2",
"dataset_type": "file_upload",
"session_type": "data_new",
"model_type": "svm",
"stream": "True"
},
"dataset": [{
"dependent-variable": "dep-variable-1",
"independent-variables": [{
"indep-variable-1": 23.45,
"indep-variable-2": 98.01,
"indep-variable-4": 325,
"indep-variable-5": 54.64,
"indep-variable-6": 0.002,
"indep-variable-7": 23,
"indep-variable-3": 0.432
}]
},
{
"dependent-variable": "dep-variable-4",
"independent-variables": [{
"indep-variable-1": 22.1,
"indep-variable-2": 95.96,
"indep-variable-4": 342,
"indep-variable-5": 66.67,
"indep-variable-6": 0.001,
"indep-variable-7": 32,
"indep-variable-3": 0.743
},
{
"indep-variable-1": 20.71,
"indep-variable-2": 99.33,
"indep-variable-4": 342,
"indep-variable-5": 75.67,
"indep-variable-6": 0.001,
"indep-variable-7": 30,
"indep-variable-3": 0.648
}
]
},
{
"dependent-variable": "dep-variable-5",
"independent-variables": [{
"indep-variable-1": 23.27,
"indep-variable-2": 95.03,
"indep-variable-4": 295,
"indep-variable-5": 55.83,
"indep-variable-6": 0.001,
"indep-variable-7": 27,
"indep-variable-3": 0.488
},
{
"indep-variable-1": 23.27,
"indep-variable-2": 95.03,
"indep-variable-4": 295,
"indep-variable-5": 55.83,
"indep-variable-6": 0.001,
"indep-variable-7": 27,
"indep-variable-3": 0.488
},
{
"indep-variable-1": 19.99,
"indep-variable-2": 97.78,
"indep-variable-4": 303,
"indep-variable-5": 58.88,
"indep-variable-6": 0.001,
"indep-variable-7": 29,
"indep-variable-3": 0.638
}
]
},
{
"dependent-variable": "dep-variable-3",
"independent-variables": [{
"indep-variable-1": 22.67,
"indep-variable-2": 101.21,
"indep-variable-4": 427,
"indep-variable-5": 75.45,
"indep-variable-6": 0.002,
"indep-variable-7": 26,
"indep-variable-3": 0.832
}]
}
]
}
Our current run.py
execution yields the following traceback:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 67, in create_connection
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 787, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 217, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f454a51b438>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 273, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='True', port=8585): Max retries exceeded with url: /login (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f454a51b438>: Failed to establish a new connection: [Errno -2] Name or service not known',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 61, in <module>
run(*argv[1:])
File "run.py", line 57, in run
port=port
File "/home/ubuntu/ist-652/utility/wikipedia_scraper.py", line 78, in wikipedia_scraper
data={'user[login]': username, 'user[password]': password}
File "/usr/lib/python3/dist-packages/requests/api.py", line 107, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='True', port=8585): Max retries exceeded with url: /login (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f454a51b438>: Failed to establish a new connection: [Errno -2] Name or service not known',))
We need to test the following snippet:
# get access token
login = requests.post(
'https://{}:{}/login'.format(endpoint, port),
headers={'Content-Type': 'application/json'},
data={'user[login]': username, 'user[password]': password}
)
token = login.json['access_token']
print('token: {}'.format(token))
If we cannot get the above working in a suitable amount of time, we can temporarily omit the above, since the current endpoint allows anonymous requests. However, this may not be a long term functionality. Additionally, we need to determine if incoming port rules need to be adjust. Possible cases include the http
, and https
protocol type.
When running the followingtest.py
script:
import requests
username = 'jeff1evesque'
password = 'xxxxxxxxxx'
endpoint = '11.11.11.11'
port = 8585
login = requests.post(
'https://{}:{}/login'.format(endpoint, port),
headers={'Content-Type': 'application/json'},
data={'user[login]': username, 'user[password]': password}
)
token = login.json['access_token']
print('token: {}'.format(token))
We receive the following traceback:
root@ubuntu-xenial:/vagrant# python3 test.py
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 787, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 252, in connect
ssl_version=resolved_ssl_version)
File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 305, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3.5/ssl.py", line 377, in wrap_socket
_context=self)
File "/usr/lib/python3.5/ssl.py", line 752, in __init__
self.do_handshake()
File "/usr/lib/python3.5/ssl.py", line 988, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib/python3.5/ssl.py", line 633, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 589, in urlopen
raise SSLError(e)
requests.packages.urllib3.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 11, in <module>
data={'user[login]': username, 'user[password]': password}
File "/usr/lib/python3/dist-packages/requests/api.py", line 107, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 447, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)
One probable problem is that the corresponding machine-learning application implements a self signed certificate. Additionally, all http
requests are redirected to https
. Several possible solutions exists, if implementing the api is still desired:
verify='/path/to/public_key.pem'
syntax with the corresponding requests.post
.This problem is two fold problem:
requests.post
must ignore validation)Therefore, we'll temporarily merge the changes in this issue, and return to it if more time permits. In the meantime, we'll develop an else
condition. This will generate a csv file, containing the article name, and the predicted article category.
We need to pipe our scraped wikipedia + twitter data, to our machine-learning application.