Closed justincpresley closed 2 years ago
There can be a problem when you use coroutine with multithreading.
I have gotten coroutines to work well with multithreading in the past, but it requires careful coding. It did seem to be the best way to get the best performance too instead of queues. However, I have not heard about the GIL issue though, so that last statement is subject to alter.
I believe the main problem here is the fact that NDNApp is not thread-safe. Will this be a future improvement?
So far, I have encountered two applications (ndn-hydra + nr-archway) that could heavily benefit from / require using NDN-related API calls (i.e. python-ndn) in multiple threads (with coroutines). More are bound to be encountered in the future. I have tried to use queuing for handing off data/commands in one application; however, the performance (considering NDN's current performance) indicts that it was not a viable solution.
I believe the main problem here is the fact that NDNApp is not thread-safe. Will this be a future improvement?
No. Since most python application cannot benefit from multithreading, it is not worthy doing so.
There are multiple articles talking about GIL, such as this.
In short, nearly every python statement will acquiring a global shared lock, so python can basically execute only one thread at one time, unless there are some underlying code not written in Python avoiding acquiring that lock (Machine-Learning is that case).
You can also try the following code to see that multithread will be even slower unless using a ML library (such as numpy
)
import time
from threading import Thread
from multiprocessing import Process
COUNT = 50000000
# 0.5*10^8 -> about 0.5s for a single-threaded C++ program
# But for Python 3, it will need 1~10 seconds
# Better computers can run faster.
def target(l, r):
ret = 0
for i in range(l, r):
ret = ret + i * i # Unlike C, this will not overflow
return ret
# To see numpy results, replace it with the following:
# def target(l, r):
# lst = np.arange(l, r)
# return np.sum(np.square(lst))
def get_time(func):
start = time.time()
func()
end = time.time()
print(f'{(end - start):.6f}s')
def single_thread():
"""
Run target in single thread.
"""
target(0, COUNT)
def multi_thread():
"""
Run target in 2 threads.
"""
half = COUNT // 2
t1 = Thread(target=target, args=(0, half))
t2 = Thread(target=target, args=(half, COUNT))
t1.start()
t2.start()
t1.join()
t2.join()
def multi_process():
"""
Run target in 2 Python processes, instead of threads.
"""
half = COUNT // 2
p1 = Process(target=target, args=(0, half))
p2 = Process(target=target, args=(half, COUNT))
p1.start()
p2.start()
p1.join()
p2.join()
if __name__ == '__main__':
print('Single-threading: ', end='')
get_time(single_thread)
print('Multi-threading: ', end='')
get_time(multi_thread)
print('Multi-processing: ', end='')
get_time(multi_process)
So far, I have encountered two applications (ndn-hydra + nr-archway) that could heavily benefit from / require using NDN-related API calls (i.e. python-ndn) in multiple threads (with coroutines). More are bound to be encountered in the future. I have tried to use queuing for handing off data/commands in one application; however, the performance (considering NDN's current performance) indicts that it was not a viable solution.
Another way you can try is create one NDNApp instance per thread, but I don't know what will happen since no one has ever done that. Theoretically it is doable, as NDNApp does not rely on any global variables.
For standard python3:
Single-threading: 2.896443s
Multi-threading: 2.052587s
Multi-processing: 1.851165s
For numpy python3:
Single-threading: 0.370442s
Multi-threading: 0.102015s
Multi-processing: 0.113359s
Seems like newer hardware (which I am running on the latest) might help with this problem as referred to, but I can see what you mean. It still would be advantageous to add thread-safe, but I understand its weak acceptance. I appreciate the provided example. Wonder if python is working to alleviate this GIL issue...
Another way you can try is create one NDNApp instance per thread, but I don't know what will happen since no one has ever done that. Theoretically it is doable, as NDNApp does not rely on any global variables.
It does work or at least in the simplest of usage. Tested and confirmed.
Wonder if python is working to alleviate this GIL issue...
GIL has a long long history. I don't think it is easy to fix.
It does work or at least in the simplest of usage. Tested and confirmed.
Great.
When trying to use an NDNApp access in another thread and prepare data, I receive the following error:
Are you unable to use an NDNApp in multiple threads? If not, If I create multiple instances of an 'NDNApp's, will published data from one App satisfy another App's interests?