kermitt2 / grobid_client_python

Python client for GROBID Web services
Apache License 2.0
275 stars 74 forks source link

104: Connection reset by peer #62

Closed manuelrech closed 11 months ago

manuelrech commented 1 year ago

I am trying to process 4000 pdf with 1000 batch size, as it was on default. Server and client are both on EC2 and I launch code with nohup grobid_client --input /home/ec2-user/grobid_parser/4000papers --output /home/ec2-user/grobid_parser/4000papers_out processFulltextDocument > logfile4000.txt After a while I get the exception 104, connection reset by peer.

cat logfile4000.txt GROBID server is up and running Processing of /home/ec2-user/grobid_parser/4000papers/0074_0.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/1001966.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/004.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/001119351.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/016.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/00102486-1.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/00101809-1.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/000Jin_et_al-2016-Advanced_Materials.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/1007156.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (15).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (14).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/0005D.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (13).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (12).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (11).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (10).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/011104_1.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/00dcf54f785c581bc55bf218e5255b866978.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/1007233.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (6).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (8).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1016_j.biortech.2018.04.053.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (7).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10100897.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007_s10098-013-0608-4.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (5).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (4).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (16).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1021acssuschemeng.8b01730.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007s13726-020-00793-w.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (3).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1016_j.progpolymsci.2013.05.010.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.11648.j.ajmsp.20160103.11.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007s10311-020-00989-9.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (2).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (1).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (17).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (9).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10100895.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/124.pdf.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/12.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/17.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/191.Amaresh Chakrabarti.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/2010_Book_GreenMetathesisChemistry (1).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/2010_Book_GreenMetathesisChemistry.pdf failed with error 408 , None Traceback (most recent call last): File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 802, in urlopen **response_kw, File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 536, in _make_request response = conn.getresponse() File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connection.py", line 454, in getresponse httplib_response = super().getresponse() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 1373, in getresponse response.begin() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 319, in begin version, status, reason = self._read_status() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 288, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/ec2-user/.local/lib/python3.7/site-packages/requests/adapters.py", line 497, in send chunked=chunked, File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 845, in urlopen method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2] File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 470, in increment raise reraise(type(error), error, _stacktrace) File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/util/util.py", line 38, in reraise raise value.with_traceback(tb) File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 802, in urlopen **response_kw, File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 536, in _make_request response = conn.getresponse() File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connection.py", line 454, in getresponse httplib_response = super().getresponse() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 1373, in getresponse response.begin() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 319, in begin version, status, reason = self._read_status() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 288, in _read_status raise RemoteDisconnected("Remote end closed connection without" urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/envs/grobid/bin/grobid_client", line 33, in sys.exit(load_entry_point('grobid-client-python==0.0.5', 'console_scripts', 'grobid_client')()) File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/grobid_client.py", line 482, in main File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/grobid_client.py", line 139, in process File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/grobid_client.py", line 212, in process_batch File "/opt/conda/envs/grobid/lib/python3.7/concurrent/futures/_base.py", line 428, in result return self.get_result() File "/opt/conda/envs/grobid/lib/python3.7/concurrent/futures/_base.py", line 384, in get_result raise self._exception File "/opt/conda/envs/grobid/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, self.kwargs) File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/grobid_client.py", line 279, in process_pdf File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/client.py", line 186, in post File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/client.py", line 128, in call_api File "/home/ec2-user/.local/lib/python3.7/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, kwargs) File "/home/ec2-user/.local/lib/python3.7/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "/home/ec2-user/.local/lib/python3.7/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, kwargs) File "/home/ec2-user/.local/lib/python3.7/site-packages/requests/adapters.py", line 501, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Anybody has ideas?

kermitt2 commented 1 year ago

Hi @manuelrech !

Can you send a single curl request to your ec2 grobid server? Does it give also the 408 error (timeout):

curl -v --form input=@./thefile.pdf localhost:8070/api/processFulltextDocument

(replacing the localhost by the address of your grobid server)

Did you setup your server with a grobid docker image? If you are using the full image with deep learning models without GPU, maybe there's not enough RAM.

In general, you could also reduce first the concurrency to 1 in the client, and see if you can get rid of the 408 timeout?

manuelrech commented 1 year ago

With 1 file works fine and also with 159!

my request is nohup grobid_client --input /home/ec2-user/grobid_parser/4000papers --output /home/ec2-user/grobid_parser/4000papers_out processFulltextDocument > logfile4000.txt

I did not solved the issue by building docker with --gpus all option and set 'concurrency':1 in the config file as now I have again requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

I also was monitoring the memory usage and I see image is It normal that so much cpu is used (sometimes even it goes to 350%)?

avani17101 commented 11 months ago

I face the same error, @manuelrech could you solve it? @kermitt2 could you help?

manuelrech commented 11 months ago

I switched to a machine with more cpu and did these modifications on the configuration file of the docker:

the config file on the client side was instead this: { "grobid_server": "http://localhost:8070/", "batch_size": 200, "sleep_time": 5, "timeout": 60, "coordinates": [ "persName", "figure", "ref", "biblStruct", "formula", "s", "note" ] }

kermitt2 commented 11 months ago

Hello !

Just as complementary information, in particular for @avani17101, the problem is related to the server and using a concurrency which is too high for the capacity of the server - the server is overloaded, out of memory and disconnect from the client.

grobid_client --input /home/ec2-user/grobid_parser/4000papers --output /home/ec2-user/grobid_parser/4000papers_out  --n 4 processFulltextDocument 

For example, I run in a routine manner Grobid on a server 8 threads/CPU, 32MB RAM with a concurrency of 8 with some DL models and without GPU. this gives around 4 PDF processed per seconds. The goal is to use as much as CPU as possible, so 800% usage is normal in this case.