Open sergeyyatmellanox opened 4 years ago
Thank you for taking the time to provide details and opening an issue :100:
I'm hoping that you will help me isolate the issue a bit more. Here are some things that I find nice-to-have; I would be forever grateful if you provided them. So, these items would help me resolve this issue quicker and reduce innate ambiguity.
If you would feel more comfortable using PGP and back-channeling, here is my email is comrumino@archstrike.org and public key---contact info is under my user profile as well.
Of course, concurrency is difficult to debug. It seems likely that the root cause(s) overlap for this issue, #354, and #360. Even if we are unable to reproduce deadlocks or timeouts, an executable script to emulate program behavior would be more precise and insightful than bullet point outlines.
Abstracting away program specifics, things we should consider while resolving the bug:
Process
, Popen
, Thread
, _thread
, etc.)RLock
, Lock
, Sempahore
, etc.)Improving the documentation on threading with RPyC would be beneficial for all:
Hi,
Thank you for taking time to help me. I'm strongly agree with you about documentation. I've been using this package for 3-4 years now, and always felt like the documentation and examples are too basic.
I forgot an important detail! The issue started once I updated the RPyC to 4.1.4, it didn't happen on 3.4.3.
I'm adding a small code snippet that simulates what my actual program does. Basically:
It's important to note that the shared object is not always a dictionary, sometimes it's a custom class. I'm using the simple threading.Thread to create and run threads. The synchronization is done by waiting for a value in the dictionary to change(one thread sleeps while it waits for some other thread to update that value. For example a counter that should hit zero, each thread decreases the counter by 1 and sleeps until the counter reaches zero using RLock to sync writes or the synchronization is done between only 2 threads and they simply wait until each one updates designated boolean to True). The number of threads is less than 10. The main thread creates 2 threads and then each such thread creates another thread that runs some function, but the sync is done by the "first-level" threads(the ones created by the Main thread).
import threading
import random
import time
import logging
import rpyc
logger = logging.getLogger()
def init():
shared_dict = {}
for i in range(2):
shared_dict[i] = {}
for j in range(10):
shared_dict[i][j] = {'done': False}
return shared_dict
def fib(n):
if n==1:
return 0
elif n==2:
return 1
else:
return fib(n-1) + fib(n-2)
def foo(i, j, shared_dict):
logger.info('start foo: {}'.format((i,j, shared_dict[i])))
logger.info('foo {} fib = {}'.format((i, j), fib(random.randint(10,20))))
shared_dict[i][j]['done'] = True
logger.info('done foo: {}'.format((i,j, shared_dict[i])))
def baz(i, shared_dict):
logger.info('start baz: {}'.format(i))
threads = []
for j in range(10):
t = threading.Thread(target=foo, args=(i,j,shared_dict))
t.setDaemon(True)
t.start()
threads.append(t)
for _i in shared_dict:
while not shared_dict[_i][j]['done']:
time.sleep(0.1)
for t in threads:
t.join()
logger.info('done baz: {}'.format(i))
def bar(conn, i ,shared_dict):
logger.info('start bar: {}'.format((conn, i)))
conn.modules.rpyc_debug_shared_dict.baz(i, shared_dict)
logger.info('done bar: {}'.format((conn, i)))
def start():
connections = []
connections.append(rpyc.classic.connect('10.141.32.7', 18812))
connections.append(rpyc.classic.connect('10.141.32.8', 18812))
for conn in connections:
conn.modules.sys.path.append('/workspace/rpyc_debug')
for k in range(1000):
threads = []
shared_dict = init()
for i, conn in enumerate(connections):
thread = threading.Thread(target=bar, args=(conn, i, shared_dict))
thread.setDaemon(True)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
print('try NO.', k)
if __name__ == '__main__':
start()
Is there any update on this issue ? Will it be fixed in the next release ?
@comrumino Are you working on this issue ?
Slowly :), i think there was a PR some time ago on the threading fun.
My code uses the forking server. During my program execution it creates many threads that all use the same connection. From time to time some sort of deadlock appears.
I've added a bunch of prints to rpyc/packages/core/protocol.py:
and to rpyc/packages/core/async_.py:
and what got is:
As I understand it:
Environment
Minimal example
Unfortunately my program is too complex to publish here and I wasn't able to reproduce it otherwise.