V2: Socket timed out after loading big datasets

lucassardois commented 2 years ago

I'm trying to run the V2 benchmark on the dbpedia.escaped.json and ldbc.scale10.escaped.json datasets. During the loading of both datasets the following error occurs:

10:14:31| INFO     - Running benchmark: 59b23df9-67d0-45d5-980a-47a57900932a                                                                                                                                                                 
10:14:31| DEBUG    - Loading configuration from conf.toml                                                                                                                                                                                    
10:14:31| DEBUG    - Shells are running with -d                                                                                                                                                                                              
10:14:31| DEBUG    - Main benchamark loop                                                                                                                                                                                                    
10:14:31| INFO     - Current dataset is dbpedia.escaped.json                                                                                                                                                                                 
10:14:31| INFO     - Current database is neo4j                                                                                                                                                                                               
10:14:31| INFO     - Loading dbpedia.escaped.json into neo4j                                                                                                                                                                                 
12:50:25| INFO     - Commiting neo4j with dbpedia.escaped.json as data.graphbenchmark.com/neo4j_dbpedia.escaped.json_dbf21247-0b6b-4123-aedd-a67968a9ba8d                                                                                    
Traceback (most recent call last):
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 445, in _make_request                                                                                                                
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 440, in _make_request                                                                                                                
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1352, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 310, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 271, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 756, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 447, in _make_request                                                                                                                
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 337, in _raise_timeout                                                                                                               
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)                                                                                                                  

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "control.py", line 604, in <module>
    main()
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "control.py", line 427, in run_benchmark
    cnt.commit(raw_data_image)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/docker/models/containers.py", line 136, in commit                                                                                                                     
    **kwargs)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/docker/utils/decorators.py", line 19, in wrapped
    return f(self, resource_id, *args, **kwargs)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/docker/api/container.py", line 148, in commit
    self._post_json(u, data=conf, params=params), json=True
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/docker/api/client.py", line 289, in _post_json
    return self._post(url, data=json.dumps(data2), **kwargs)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/docker/api/client.py", line 226, in _post
    return self.post(url, **self._set_request_timeout(kwargs))
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/home/lsardois/bench/CONTROL/.venv/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

The benchmark process then stop. The above error happen when using neo4j but I think (not 100% sure) the same error is raised for others GDBMS. Here is how I added the datasets to my conf.toml:

[datasets."dbpedia.escaped.json"]
path = "/runtime/data/dbpedia.escaped.json"
uid_field = "uid"

I downloaded the datasets using your provided link and placed it in /CONTROL/runtime/data.

MartinBrugnara commented 2 years ago

Yes, we are aware of the issue.

There is an hard-coded timeout in the docker library... It is not the commit operation it-self that is failing, because it is actually being carried out by the docker daemon, it is instead the python library giveing up on waiting.

The real solution would be to catch the error and then do polling (or remove the hard-coded timeout from the upstream library).

The easier and faster one, is to wait for the docker daemon to complete the operation (just look for when the image becomes available or use ps) and the rerun the script, the second time it will find the image already in-place and will not try to "reload-it". You may experience the same issue when committing after sampling depending your containers' storage backend.

lucassardois commented 2 years ago

Well, I currently need to overcome this issue. After checking the docker python library documentation I found that we should be able to set the default API timeout for API calls: https://docker-py.readthedocs.io/en/stable/client.html#docker.client.from_env.

I tried by updating this line: https://github.com/kuzeko/graph-databases-testsuite/blob/133116cbcc1c61cf088441e1b5907c4fbd4531f1/CONTROL/control.py#L72 To:

   client = docker.from_env(timeout=60*60)

But for some reason the library doesn't seems to respect it on the commit operation? I'm gonna try to catch the error as you suggested.

kuzeko / graph-databases-testsuite

V2: Socket timed out after loading big datasets #28