Open parashardhapola opened 7 years ago
Thanks, I'll investigate.
similar issue here
I am on ver 5.2 with anaconda installation
per
return fn(*args, **kwargs)
File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/scheduler.py", line 325, in <lambda>
lambda : self.handle_stranded_tasks(uid),
File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/scheduler.py", line 335, in handle_stranded_tasks
for msg_id in lost.keys():
RuntimeError: dictionary changed size during iteration
2017-02-25 17:08:58.400 [IPControllerApp] task::task 'e7647038-edff-4814-8939-84afced09336' finished on 7
2017-02-25 17:08:58.401 [IPControllerApp] ERROR | DB Error saving task request 'e7647038-edff-4814-8939-84afced09336'
Traceback (most recent call last):
File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/hub.py", line 794, in save_task_result
self.db.update_record(msg_id, result)
File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/dictdb.py", line 232, in update_record
raise KeyError("Record %r has been culled for size" % msg_id)
KeyError: "Record 'e7647038-edff-4814-8939-84afced09336' has been culled for size"
2017-02-25 17:08:58.402 [IPControllerApp] task::task 'fce1ddc0-c360-43eb-902b-0477bd259dba' finished on 8
2017-02-25 17:08:58.402 [IPControllerApp] ERROR | DB Error saving task request 'fce1ddc0-c360-43eb-902b-0477bd259dba'
Traceback (most recent call last):
File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/hub.py", line 794, in save_task_result
self.db.update_record(msg_id, result)
File "/home/julian/anaconda3/lib/python3.5/site-packages/ipyparallel/controller/dictdb.py", line 232, in update_record
controller runs on Linux while clients run on a variety of linux/windows machines.
Hi. Any updates on this issue? I'm having the same problem sometimes.
@jayzed82
My issues were my fault. I was sending more than 1024 tasks in parallel. You need to manually change the limit if you want to go beyond that limit
Have you checked if you try to fill the queue with more than 1024 tasks ?
Thank you @littlegreenbean33. That is my problem, I have a queue longer than 1024 task. I didn't know there was a limit. How do you increase it?
Look for 1024 or the text in the error report in the code. You will find informative comments as well inside. There was some balance to achieve with regards to memory usage and 1024 probably sound like a good number.
@littlegreenbean33 I'm not quite sure what you mean. Can you point us more specifically?
And what does actually happen when we encounter these ERROR | DB Error saving task request
messages? Are the computation results going to be faulty, and hence useless? Then why isn't a warning or something more visible shown on client-side, i.e., in IPython/Jupyter? Or can that "error" just be ignored as the hub handles it somehow magically?
if your task queue grows above 1024 bad things happen. Don't ignore the error. It means tasks won't be performed.
So does this actually mean that IPyParallel cannot have more than 1024 tasks queued? Then, why there is no error, or at least a warning? If you run ipcluster
in --daemon
mode, you won't even be able to notice that! And how can we lift that limit?
I've just run a quick test to see what happens if I submit more than 1024 tasks. In this test, I only have a single engine, hence the task queue should be about 2047 tasks in size, before the first task is finished:
import ipyparallel as ipp
import numpy as np
ipp_client = ipp.Client()
ipp_client[:].use_dill().get()
def f(ms):
def _f(x):
if ms > 0:
import time
time.sleep(ms * 1e-3)
return x * 2
return _f
data = range(2048)
result = ipp_client[:].map(f(100), data).get()
print(np.allclose(result, map(f(0), data)))
This works like a charm. How does that match with your statement, that the task queue cannot grow beyond the size of 1024 tasks? @littlegreenbean33
It means tasks won't be performed.
It does not mean that. This error does not affect execution or results during normal execution. The only thing affected is the result cache in the Hub, which can be used for delayed retrieval by id. If you are not using delayed retrieval (client.get_result(msg_ids)
instead of asyncresult.get()
), there should be no user-visible effect.
The default cache of results in the Hub is an in-memory DictDB, with a few limits. You can increase those limits, or tell the controller to use sqlite or mongodb to store these things out of memory. If you aren't using delayed retrieval at all, you can use NoDB
to disable result caching entirely.
Thanks a lot for that clarification @minrk
Hi,
I get following error when I try to run a simple test job:
I'm ran following code:
I have run 30 engines on two different hosts by running
ipcluster engines -n 30
and have runipcontroller --ip="*"
on he host runningjupyter notebook
wait_interactive
output hangs at59/60
.Please check if this error can be reproduced.