Closed manuelrech closed 11 months ago
Hi @manuelrech !
Can you send a single curl request to your ec2 grobid server? Does it give also the 408 error (timeout):
curl -v --form input=@./thefile.pdf localhost:8070/api/processFulltextDocument
(replacing the localhost by the address of your grobid server)
Did you setup your server with a grobid docker image? If you are using the full image with deep learning models without GPU, maybe there's not enough RAM.
In general, you could also reduce first the concurrency to 1 in the client, and see if you can get rid of the 408 timeout?
With 1 file works fine and also with 159!
my request is nohup grobid_client --input /home/ec2-user/grobid_parser/4000papers --output /home/ec2-user/grobid_parser/4000papers_out processFulltextDocument > logfile4000.txt
I did not solved the issue by building docker with --gpus all
option and set 'concurrency':1
in the config file as now I have again
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
I also was monitoring the memory usage and I see
is It normal that so much cpu is used (sometimes even it goes to 350%)?
I face the same error, @manuelrech could you solve it? @kermitt2 could you help?
I switched to a machine with more cpu and did these modifications on the configuration file of the docker:
sudo nohup docker run --restart always -p 8070:8070 -v /home/dev/grobid.yaml:/opt/grobid/grobid-home/config/grobid.yaml:ro lfoppiano/grobid:0.7.3 &
just change your pathsthe config file on the client side was instead this:
{ "grobid_server": "http://localhost:8070/", "batch_size": 200, "sleep_time": 5, "timeout": 60, "coordinates": [ "persName", "figure", "ref", "biblStruct", "formula", "s", "note" ] }
Hello !
Just as complementary information, in particular for @avani17101, the problem is related to the server and using a concurrency which is too high for the capacity of the server - the server is overloaded, out of memory and disconnect from the client.
adapt the concurrency to the actual number of thread/CPU available on your server (e.g. 4 CPU single-threaded use concurrency of 4)
use concurrency parameter at client-side to simplify: the parameter is --n
, for example:
grobid_client --input /home/ec2-user/grobid_parser/4000papers --output /home/ec2-user/grobid_parser/4000papers_out --n 4 processFulltextDocument
use enough RAM memory on your server. For low concurrency 16MB is okay but if you want to process with a concurrency of 8, 32MB is safer.
if Deep Learning models are used without GPU, then some more RAM need to be available
For example, I run in a routine manner Grobid on a server 8 threads/CPU, 32MB RAM with a concurrency of 8 with some DL models and without GPU. this gives around 4 PDF processed per seconds. The goal is to use as much as CPU as possible, so 800% usage is normal in this case.
I am trying to process 4000 pdf with 1000 batch size, as it was on default. Server and client are both on EC2 and I launch code with
nohup grobid_client --input /home/ec2-user/grobid_parser/4000papers --output /home/ec2-user/grobid_parser/4000papers_out processFulltextDocument > logfile4000.txt
After a while I get the exception 104, connection reset by peer.cat logfile4000.txt
GROBID server is up and running Processing of /home/ec2-user/grobid_parser/4000papers/0074_0.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/1001966.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/004.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/001119351.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/016.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/00102486-1.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/00101809-1.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/000Jin_et_al-2016-Advanced_Materials.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/1007156.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (15).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (14).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/0005D.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (13).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (12).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (11).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (10).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/011104_1.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/00dcf54f785c581bc55bf218e5255b866978.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/1007233.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (6).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (8).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1016_j.biortech.2018.04.053.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (7).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10100897.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007_s10098-013-0608-4.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (5).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (4).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (16).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1021acssuschemeng.8b01730.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007s13726-020-00793-w.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (3).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1016_j.progpolymsci.2013.05.010.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.11648.j.ajmsp.20160103.11.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007s10311-020-00989-9.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (2).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (1).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (17).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10.1007978-981-15-1251-3 (9).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/10100895.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/124.pdf.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/12.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/17.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/191.Amaresh Chakrabarti.pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/2010_Book_GreenMetathesisChemistry (1).pdf failed with error 408 , None Processing of /home/ec2-user/grobid_parser/4000papers/2010_Book_GreenMetathesisChemistry.pdf failed with error 408 , None Traceback (most recent call last): File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 802, in urlopen **response_kw, File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 536, in _make_request response = conn.getresponse() File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connection.py", line 454, in getresponse httplib_response = super().getresponse() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 1373, in getresponse response.begin() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 319, in begin version, status, reason = self._read_status() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 288, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without responseDuring handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/ec2-user/.local/lib/python3.7/site-packages/requests/adapters.py", line 497, in send chunked=chunked, File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 845, in urlopen method, url, error=new_e, _pool=self, _stacktrace=sys.exc_info()[2] File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 470, in increment raise reraise(type(error), error, _stacktrace) File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/util/util.py", line 38, in reraise raise value.with_traceback(tb) File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 802, in urlopen **response_kw, File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 536, in _make_request response = conn.getresponse() File "/home/ec2-user/.local/lib/python3.7/site-packages/urllib3/connection.py", line 454, in getresponse httplib_response = super().getresponse() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 1373, in getresponse response.begin() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 319, in begin version, status, reason = self._read_status() File "/opt/conda/envs/grobid/lib/python3.7/http/client.py", line 288, in _read_status raise RemoteDisconnected("Remote end closed connection without" urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/conda/envs/grobid/bin/grobid_client", line 33, in
sys.exit(load_entry_point('grobid-client-python==0.0.5', 'console_scripts', 'grobid_client')())
File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/grobid_client.py", line 482, in main
File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/grobid_client.py", line 139, in process
File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/grobid_client.py", line 212, in process_batch
File "/opt/conda/envs/grobid/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.get_result()
File "/opt/conda/envs/grobid/lib/python3.7/concurrent/futures/_base.py", line 384, in get_result
raise self._exception
File "/opt/conda/envs/grobid/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, self.kwargs)
File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/grobid_client.py", line 279, in process_pdf
File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/client.py", line 186, in post
File "/opt/conda/envs/grobid/lib/python3.7/site-packages/grobid_client_python-0.0.5-py3.7.egg/grobid_client/client.py", line 128, in call_api
File "/home/ec2-user/.local/lib/python3.7/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, send_kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/requests/adapters.py", line 501, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Anybody has ideas?