Open nickdesaulniers opened 3 years ago
FWIW: attaching gdb and grabbing a py-bt
looks like:
(gdb) py-bt
Traceback (most recent call first):
<built-in method acquire of _multiprocessing.SemLock object at remote 0x7fc951da5470>
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/channel.py", line 140, in unix_acquire
return (self.reader_mutex.acquire(timeout=LOCK_TIMEOUT) and
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/channel.py", line 130, in __enter__
if self.acquire():
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/channel.py", line 110, in lock
with self.mutex:
<built-in method next of module object at remote 0x7fc9523fd0e0>
File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/process.py", line 440, in task_transaction
with channel.lock:
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/process.py", line 433, in fetch_task
return task_transaction(channel)
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/process.py", line 419, in worker_get_next_task
yield fetch_task(channel)
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/process.py", line 404, in worker_process
for task in worker_get_next_task(channel, params.max_tasks):
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/multiprocessing/process.py", line 571, in _bootstrap
File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 75, in _launch
code = process_obj._bootstrap(parent_sentinel=child_r)
File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.8/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.8/multiprocessing/process.py", line 633, in start
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/common.py", line 141, in launch_process
process.start()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/process.py", line 632, in new_worker
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/process.py", line 364, in create_workers
self.new_worker()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/process.py", line 192, in start
self.worker_manager.create_workers()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/process.py", line 63, in _start_pool
self._pool_manager.start()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/base_pool.py", line 100, in _update_pool_state
self._start_pool()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/base_pool.py", line 91, in _check_pool_state
self._update_pool_state()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.8/site-packages/pebble/pool/process.py", line 90, in schedule
self._check_pool_state()
File "/android1/cvise/build/cvise/utils/testing.py", line 693, in run_parallel_tests
File "/android1/cvise/build/cvise/utils/testing.py", line 764, in run_pass
File "/android1/cvise/build/cvise/cvise.py", line 161, in _run_additional_passes
self.test_manager.run_pass(p)
File "/android1/cvise/build/cvise/cvise.py", line 394, in reduce
File "/android1/cvise/build/cvise.py", line 1051, in <module>
When I run cvise, sometimes it seems stalled on making progress. When I check
htop
, it doesn't look like my cores are doing any work related to cvise. It looks like00:00:03 INFO ===< LinesPass::0 >===
is what's running. I'm running cvise with
--n 71
.
Interesting. How long does such a stall take? I'm quite often running with -n 64 one a machine and it seems fine.
Are there synchronization points where all threads need to wait on one to make progress? Bisection will eventually converge, but it seems to take quite some time to make forward progress.
What I do is starting N
jobs (if we have enough "states") and waiting for a first successfull. Once that happens, I wait for all jobs started before the first successfull. Rest is canceleed.
Playing with
ps -ef
, it looks like there are 71 instances of the python interpreter running, with 0 instances of clang, clang_delta, or my reproducer shell script.
Anyway, I think it's a stall in Pebble library and it would be worth creating a reproducer for it and report it to upstream.
I'm not sure if this is related, but for me the step "UnIfDefPass" does sometimes stall and does not progress. Even though that step has nothing to do. After re-running the script several times, it will eventually skip that step.
The timeout does not seem to interrupt.
I'm calling cvise like this:
cvise --n 40 --skip-key-off --timeout 20 --clang-delta-std=c++20 ./creduce_probe.sh ice.cpp
I'm not on the newest version right now (cvise 2.2.0 (4c7f4cd)), so I'll first update and reevaluate the situation.
I'm not sure if this is related, but for me the step "UnIfDefPass" does sometimes stall and does not progress. Even though that step has nothing to do. After re-running the script several times, it will eventually skip that step.
It's likely something different. Can you please paste output of cvise with --verbose
?
The timeout does not seem to interrupt.
I'm calling cvise like this:
cvise --n 40 --skip-key-off --timeout 20 --clang-delta-std=c++20 ./creduce_probe.sh ice.cpp
I'm not on the newest version right now (cvise 2.2.0 (4c7f4cd)), so I'll first update and reevaluate the situation.
That's likely not a change since 2.2.0
that would resolve it.
Would it be possible to attach also a reproducer?
Would it be possible to attach also a reproducer?
And please open a new issue for it.
What I do is starting N jobs (if we have enough "states") and waiting for a first successfull. Once that happens, I wait for all jobs started before the first successfull. Rest is canceleed.
Right, the call to pool.join()
seems to be taking a long time.
https://github.com/noxdafox/pebble/issues/50 looks related, though it sounds like that was fixed in 4.5.1.
pip3 list | grep Pebble
Pebble 4.6.1
Otherwise https://github.com/noxdafox/pebble/issues/62 and https://github.com/noxdafox/pebble/issues/33 look potentially related.
One thing I found interesting, grepping for the parent pid:
$ ps -ef | grep 1052950
ndesaul+ 1052950 3884808 2 14:44 pts/4 00:02:06 python3 /android1/cvise/build/cvise.py --n 71 ./repro.sh extable.i
ndesaul+ 1088933 1052950 2 14:46 pts/4 00:02:00 python3 /android1/cvise/build/cvise.py --n 71 ./repro.sh extable.i
ndesaul+ 1368039 1052950 0 15:55 pts/4 00:00:00 python3 /android1/cvise/build/cvise.py --n 71 ./repro.sh extable.i
makes it look like the one child has been running for 2 minutes? and the rest for 0?
If I drop --n
to 4
from 71
, it's much faster.
moving from 1a9f47f53037 to 21c4e265556bba062898cbb673982a6fa1444d1f seems to have helped.
no --n
:
00:17:04 INFO ===================== done ====================
(previously, reducing the same input with the older version of cvise took over 1hr10min)
--n 71
(long pauses were observed while running LinesPass::0
where cores look idle:
00:24:50 INFO (97.9%, 54600 bytes, 429 lines)
(killed, still running)
here's the python trace of the long running child (--n 71
:
(gdb) py-bt
Traceback (most recent call first):
File "/usr/lib/python3.9/threading.py", line 316, in wait
gotit = waiter.acquire(True, timeout)
File "/usr/lib/python3.9/threading.py", line 574, in wait
signaled = self._cond.wait(timeout)
File "/usr/lib/python3.9/multiprocessing/managers.py", line 176, in serve_forever
self.stop_event.wait(1)
File "/usr/lib/python3.9/multiprocessing/managers.py", line 591, in _run_server
server.serve_forever()
File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.9/multiprocessing/process.py", line 571, in _bootstrap
File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 71, in _launch
code = process_obj._bootstrap(parent_sentinel=child_r)
File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.9/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.9/multiprocessing/process.py", line 633, in start
File "/usr/lib/python3.9/multiprocessing/managers.py", line 1322, in start
# launching the manager process, so that concurrent
File "/usr/lib/python3.9/multiprocessing/context.py", line 57, in Manager
m.start()
File "/android1/cvise/build/cvise/utils/testing.py", line 472, in run_pass
m = Manager()
File "/android1/cvise/build/cvise/cvise.py", line 166, in _run_additional_passes
self.test_manager.run_pass(p)
File "/android1/cvise/build/cvise/cvise.py", line 399, in reduce
File "/android1/cvise/build/cvise.py", line 1073, in <module>
and an idle child
(gdb) py-bt
Traceback (most recent call first):
<built-in method acquire of _multiprocessing.SemLock object at remote 0x7fbf369e1630>
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/channel.py", line 146, in unix_acquire
return (self.reader_mutex.acquire(timeout=LOCK_TIMEOUT) and
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/channel.py", line 136, in __enter__
if self.acquire():
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/channel.py", line 111, in lock
with self.mutex:
<built-in method next of module object at remote 0x7fbf375228b0>
File "/usr/lib/python3.9/contextlib.py", line 119, in __enter__
return next(self.gen)
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/process.py", line 460, in task_transaction
with channel.lock:
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/process.py", line 453, in fetch_task
return task_transaction(channel)
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/process.py", line 439, in worker_get_next_task
yield fetch_task(channel)
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/process.py", line 423, in worker_process
for task in worker_get_next_task(channel, params.max_tasks):
File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.9/multiprocessing/process.py", line 571, in _bootstrap
File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 71, in _launch
code = process_obj._bootstrap(parent_sentinel=child_r)
File "/usr/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.9/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.9/multiprocessing/process.py", line 633, in start
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/common.py", line 397, in launch_process
--Type <RET> for more, q to quit, c to continue without paging--
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/process.py", line 648, in new_worker
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/process.py", line 380, in create_workers
self.new_worker()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/process.py", line 205, in start
self.worker_manager.create_workers()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/process.py", line 69, in _start_pool
self._pool_manager.start()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/base_pool.py", line 100, in _update_pool_state
self._start_pool()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/base_pool.py", line 91, in _check_pool_state
self._update_pool_state()
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/process.py", line 102, in schedule
self._check_pool_state()
File "/android1/cvise/build/cvise/utils/testing.py", line 704, in run_parallel_tests
File "/android1/cvise/build/cvise/utils/testing.py", line 776, in run_pass
File "/android1/cvise/build/cvise/cvise.py", line 166, in _run_additional_passes
self.test_manager.run_pass(p)
File "/android1/cvise/build/cvise/cvise.py", line 399, in reduce
File "/android1/cvise/build/cvise.py", line 1073, in <module>
Am I understanding correctly that pebble is blocking for 60s if it fails to acquire a lock? https://github.com/noxdafox/pebble/blob/b7bde8c5c7f44c0fd526c1db33bc972913778765/pebble/pool/channel.py#L156 https://github.com/noxdafox/pebble/blob/b7bde8c5c7f44c0fd526c1db33bc972913778765/pebble/pool/channel.py#L198
File "/usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/process.py", line 460, in task_transaction with channel.lock:
so no timeout
is specified?
no --n: 00:17:04 INFO ===================== done ====================
^ so 36 parallel tests
--n 4
:
00:28:47 INFO ===================== done ====================
--n 1
:
01:30:51 INFO ===================== done ====================
I should mention this is a dual socket system; I wonder if synchronization overhead is much much worse due to NUMA? Perhaps setting affinity to one socket would help? Perhaps if I play with numactl...
$ numactl --cpunodebind=0 /android1/cvise/build/cvise.py ./repro.sh extable.i
00:15:20 INFO ===================== done ====================
so that is slightly faster, though I guess by nature these are probably non-deterministic results. I didn't observe any pauses that run.
If I double the number of threads --n 72
AND use numactl, then I swear I keep seeing 1 minute pauses where no progress is made, and my cores look idle.
00:01:11 INFO (74.3%, 669465 bytes, 17621 lines)
<pause>
00:02:12 INFO (74.3%, 669367 bytes, 17619 lines)
...
00:02:28 INFO (88.2%, 307085 bytes, 2521 lines)
<pause>
00:03:29 INFO (88.8%, 292190 bytes, 2473 lines)
...
00:06:30 INFO (98.6%, 35650 bytes, 255 lines)
<pause>
00:07:30 INFO (98.6%, 35591 bytes, 254 lines)
...
00:07:31 INFO (98.7%, 35035 bytes, 252 lines)
<pause>
00:08:31 INFO (98.7%, 34907 bytes, 251 lines)
...
00:08:53 INFO (99.0%, 25463 bytes, 473 lines)
<pause>
00:09:53 INFO (99.0%, 25043 bytes, 456 lines)
...
00:09:54 INFO (99.1%, 24252 bytes, 424 lines)
<pause>
00:10:55 INFO (99.1%, 24070 bytes, 416 lines)
...
00:10:57 INFO (99.1%, 23072 bytes, 380 lines)
<pause>
00:11:57 INFO (99.1%, 22932 bytes, 376 lines)
# etc
I definitely think pebble's 60 timeout is related. If I hack up /usr/local/google/home/ndesaulniers/.local/lib/python3.9/site-packages/pebble/pool/channel.py
locally and set it to 10, then oversubscribe my system again, I do still see idle pauses, but they drop down to ~10s.
# LOCK_TIMEOUT = 1
$ numactl --cpunodebind=0 /android1/cvise/build/cvise.py --n 72 ./repro.sh extable.i
00:20:33 INFO ===================== done ====================
wait_for_first_success
might throw an exception though if I lower it too far and the interestingness test doesn't complete in time.
diff --git a/cvise/utils/testing.py b/cvise/utils/testing.py
index 5deb23e7d60b..47edd950206c 100644
--- a/cvise/utils/testing.py
+++ b/cvise/utils/testing.py
@@ -427,6 +427,7 @@ class TestManager:
def run_parallel_tests(self):
assert not self.futures
assert not self.temporary_folders
+ pebble.process.channel.LOCK_TIMEOUT = 0.1
with pebble.ProcessPool(max_workers=self.parallel_tests) as pool:
order = 1
self.timeout_count = 0
$ /android1/cvise/build/cvise.py ./repro.sh extable.i
00:17:16 INFO ===================== done ====================
actually, this is probably about the fastest I can run:
diff --git a/cvise/utils/testing.py b/cvise/utils/testing.py
index 5deb23e7d60b..e697bd54b0bf 100644
--- a/cvise/utils/testing.py
+++ b/cvise/utils/testing.py
@@ -421,8 +421,11 @@ class TestManager:
@classmethod
def terminate_all(cls, pool):
+ temp = pebble.process.channel.LOCK_TIMEOUT
+ pebble.process.channel.LOCK_TIMEOUT = 0.001
pool.stop()
pool.join()
+ pebble.process.channel.LOCK_TIMEOUT = temp
def run_parallel_tests(self):
assert not self.futures
$ numactl --cpunodebind=0 /android1/cvise/build/cvise.py ./repro.sh extable.i
00:15:26 INFO ===================== done ====================
so there's probably 2 things we can do immediately:
numactl
.I'm curious. It seems like cvise starts N workers, then terminates them all once the first has found whether its mutated input is interesting? So it seems like we get killed in synchronization overhead for that and don't make much forward progress the larger N gets. It also seems like it probably throws out a lot of forward progress each thread may have made individually.
I wonder if you could instead have each thread generate a diff when it found something interesting, then during synchronization apply all of the diffs all of the workers may have found in a round?
I'm also curious if it makes more sense to dynamically scale the number of workers based on how many are finding inputs interesting over time? If most mutations are generally interesting, then having more than 1 worker is probably overkill. If most mutations are generally uninteresting, having as many workers working on solutions is probably profitable.
I'm also curious if we're hitting a priority inversion between the parent and child threads?
Thank you very much for the nice investigation!
actually, this is probably about the fastest I can run:
diff --git a/cvise/utils/testing.py b/cvise/utils/testing.py index 5deb23e7d60b..e697bd54b0bf 100644 --- a/cvise/utils/testing.py +++ b/cvise/utils/testing.py @@ -421,8 +421,11 @@ class TestManager: @classmethod def terminate_all(cls, pool): + temp = pebble.process.channel.LOCK_TIMEOUT + pebble.process.channel.LOCK_TIMEOUT = 0.001 pool.stop() pool.join() + pebble.process.channel.LOCK_TIMEOUT = temp def run_parallel_tests(self): assert not self.futures
I do welcome the patch, please make a separate pull request for it. Moreover, we can likely notify Pebble
project that this can really cause troubles.
2. detect+warn users of NUMA systems that there's too much synchronization overhead and that it might be faster if they run under
numactl
.
Support that as well.
I'm curious. It seems like cvise starts N workers, then terminates them all once the first has found whether its mutated input is interesting?
Once the first one finds something interesting, we wait for all jobs that started before and kill all after it. Doing that we have deterministic output.
So it seems like we get killed in synchronization overhead for that and don't make much forward progress the larger N gets. It also seems like it probably throws out a lot of forward progress each thread may have made individually.
Yes, that can possibly happen.
I wonder if you could instead have each thread generate a diff when it found something interesting, then during synchronization apply all of the diffs all of the workers may have found in a round?
I like the idea which is basically about combine all results and run interestingness test for it.
I'm also curious if it makes more sense to dynamically scale the number of workers based on how many are finding inputs interesting over time? If most mutations are generally interesting, then having more than 1 worker is probably overkill. If most mutations are generally uninteresting, having as many workers working on solutions is probably profitable.
Yep, we can dynamically calculate the success ratio and adapt workers. Or we can at least add '+', '-' keyboard shortcuts one can use to double the size of workers?
I'm also curious if we're hitting a priority inversion between the parent and child threads?
I don't understand it, can you please explain it?
Anyway, great observations you made.
I need to play with my patch more. I thought I had it solved last night, but if I crank up --n
to be nproc
times 2, then I see minute long pauses again where my cpu cores are idle.
Perhaps the Python bindings to libuv might help.
heh, running cvise on my dual core hyper threaded laptop is faster than my 72 core xeon workstation...same reproducer, same inputs, same compiler under test...
workstation:
01:04:12 INFO ===================== done ====================
laptop:
00:35:53 INFO ===================== done ====================
Perhaps the Python bindings to libuv might help.
Do you mean using it in the Pebble
project?
Or swap Pebble for libuv. I wonder if we can speed things up by sending sigterm to processes we want to terminate.
Or swap Pebble for libuv.
But does it provide a Process pool, I can't find it.
About sigterm - can you please investigate Pebble which signal is used when a future is canceled?
Hi.
Note there's a recent Pebble 5.0
release:
https://github.com/noxdafox/pebble/releases/tag/5.0.0
Does the problem still persist? If so, please provide a reproducer I can play with.
I've recently moved to a newer workstation; one that's not dual socket, so I might no longer be able to reproduce. I'll report back once I've had another opportunity to run cvise.
Besides, I suspect a reproducer would need to be run on a NUMA machine...
All right, thanks!
When I run cvise, sometimes it seems stalled on making progress. When I check
htop
, it doesn't look like my cores are doing any work related to cvise. It looks like00:00:03 INFO ===< LinesPass::0 >===
is what's running. I'm running cvise with
--n 71
. Are there synchronization points where all threads need to wait on one to make progress? Bisection will eventually converge, but it seems to take quite some time to make forward progress.Playing with
ps -ef
, it looks like there are 71 instances of the python interpreter running, with 0 instances of clang, clang_delta, or my reproducer shell script.