smilli / py-corenlp

Python wrapper for Stanford CoreNLP
353 stars 75 forks source link

Connection refused after large number of queries #26

Open ghost opened 6 years ago

ghost commented 6 years ago

I was mistaken and posted the issue here as I thought it's an issue with the CoreNLP server. Is there a chance that the problem comes from pycorenlp and that it's a similar issue that stanza had?

My problem is that after sending a larger number of requests I am getting the following error:

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/usr/local/lib/python3.5/dist-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
OSError: [Errno 99] Cannot assign requested address

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 356, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.5/http/client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
    self.endheaders(body)
  File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python3.5/http/client.py", line 934, in _send_output
    self.send(msg)
  File "/usr/lib/python3.5/http/client.py", line 877, in send
    self.connect()
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connection.py", line 166, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fc56c06fe10>: Failed to establish a new connection: [Errno 99] Cannot assign requested address

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 649, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.5/dist-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc56c06fe10>: Failed to establish a new connection: [Errno 99] Cannot assign requested address',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pycorenlp/corenlp.py", line 19, in annotate
    requests.get(self.server_url)
  File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 502, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 612, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 504, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc56c06fe10>: Failed to establish a new connection: [Errno 99] Cannot assign requested address',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-41-25498d9e314a>", line 33, in _nlp_annotate
    return [json.loads(self.nlp.annotate(document, properties=self.properties)) for document in documents_part]
  File "<ipython-input-41-25498d9e314a>", line 33, in <listcomp>
    return [json.loads(self.nlp.annotate(document, properties=self.properties)) for document in documents_part]
  File "/usr/local/lib/python3.5/dist-packages/pycorenlp/corenlp.py", line 21, in annotate
    raise Exception('Check whether you have started the CoreNLP server e.g.\n'
Exception: Check whether you have started the CoreNLP server e.g.
$ cd stanford-corenlp-full-2015-12-09/ 
$ java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
"""

The above exception was the direct cause of the following exception:

Exception                                 Traceback (most recent call last)
<ipython-input-42-b824e1fa3bb8> in <module>()
----> 1 CoreNLPSentenceSplitter().set_url('http://localhost:9000').fit_transform(train.comment_text)

<ipython-input-41-25498d9e314a> in fit_transform(self, X, y)
     70 
     71     def fit_transform(self, X, y=None):
---> 72         return self.fit(X, y).transform(X)

<ipython-input-41-25498d9e314a> in transform(self, documents)
     52     def transform(self, documents):
     53 
---> 54         annoated_documents = self.nlp.transform(documents)
     55 
     56         docs = list()

<ipython-input-41-25498d9e314a> in transform(self, documents)
     24 
     25     def transform(self, documents):
---> 26         return parallelize(documents, self._nlp_annotate, nb_partitions=8, nb_cores=8)
     27         #return self._nlp_annotate(documents)
     28 

<ipython-input-41-25498d9e314a> in parallelize(splittable, func, nb_partitions, nb_cores)
      2     df_split = np.array_split(splittable, nb_partitions)
      3     pool = Pool(nb_cores)
----> 4     df = np.concatenate(pool.map(func, df_split))
      5     pool.close()
      6     pool.join()

/usr/lib/python3.5/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    258         in a list that is returned.
    259         '''
--> 260         return self._map_async(func, iterable, mapstar, chunksize).get()
    261 
    262     def starmap(self, func, iterable, chunksize=None):

/usr/lib/python3.5/multiprocessing/pool.py in get(self, timeout)
    606             return self._value
    607         else:
--> 608             raise self._value
    609 
    610     def _set(self, i, obj):

Exception: Check whether you have started the CoreNLP server e.g.
$ cd stanford-corenlp-full-2015-12-09/ 
$ java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
ghost commented 6 years ago

I think this is because the connection is not closed for checking whether the Stanford CoreNLP server is started:

Maybe add headers={'Connection': 'close'} to https://github.com/smilli/py-corenlp/blob/master/pycorenlp/corenlp.py#L19

# Checks that the Stanford CoreNLP server is started.
try:
    requests.get(self.server_url)
except requests.exceptions.ConnectionError:
    raise Exception('Check whether you have started the CoreNLP server e.g.\n'
    '$ cd stanford-corenlp-full-2015-12-09/ \n'
    '$ java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer')

or even better: remove that check entirely - it's not really necessary anyway.

ndvbd commented 5 years ago

Solved the problem by using python requests session instead of .post directly

huihanlhh commented 3 years ago

Solved the problem by using python requests session instead of .post directly

Hi, could you please point out what modifications you made to which line? I am currently experiencing the same issues here. Thank you!

ndvbd commented 3 years ago

@huihanlhh it was long ago, can't find the code right now