rapidsai / gpu-bdb

RAPIDS GPU-BDB
Apache License 2.0
108 stars 44 forks source link

Q23 intermittently freezing in nightly runs #154

Open beckernick opened 3 years ago

beckernick commented 3 years ago

In the automated run this morning, Q23 TCP completed once and then froze and was left running for hours with no progress.

Worker log

distributed.worker - ERROR - Set changed size during iteration
Traceback (most recent call last):
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1605, in transition_flight_memory
    self.put_key_in_memory(ts, value)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1970, in put_key_in_memory
    for dep in ts.dependents:
RuntimeError: Set changed size during iteration
distributed.utils - ERROR - Set changed size during iteration
Traceback (most recent call last):
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/utils.py", line 655, in log_errors
    yield
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 2119, in gather_dep
    self.transition(ts, "memory", value=data[d])
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1539, in transition
    state = func(ts, **kwargs)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1605, in transition_flight_memory
    self.put_key_in_memory(ts, value)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1970, in put_key_in_memory
    for dep in ts.dependents:
RuntimeError: Set changed size during iteration
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7f5676beff50>>, <Task finished coro=<Worker.gather_dep() done, defined at /raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py:2000> exception=RuntimeError('Set changed size during iteration')>)
Traceback (most recent call last):
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 2119, in gather_dep
    self.transition(ts, "memory", value=data[d])
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1539, in transition
    state = func(ts, **kwargs)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1605, in transition_flight_memory
    self.put_key_in_memory(ts, value)
  File "/raid/nicholasb/miniconda3/envs/rapids-tpcxbb-automated-tests/lib/python3.7/site-packages/distributed/worker.py", line 1970, in put_key_in_memory
    for dep in ts.dependents:
RuntimeError: Set changed size during iteration
beckernick commented 3 years ago

Resolved

beckernick commented 3 years ago

This issue is popping back up again.