named-data / python-ndn

An NDN client library with AsyncIO support in Python 3
https://python-ndn.readthedocs.io/en/latest
Apache License 2.0
24 stars 17 forks source link

NDNApp limited to One Thread? #31

Closed justincpresley closed 2 years ago

justincpresley commented 2 years ago

When trying to use an NDNApp access in another thread and prepare data, I receive the following error:

 data_packet = self.app.prepare_data(int_name + [Component.from_number(packet_number, Component.TYPE_SEGMENT)], byte_chunk, meta_info=mi)
  File "/home/corbin/.local/lib/python3.8/site-packages/ndn/app.py", line 143, in prepare_data
    signer = self.keychain.get_signer(kwargs)
  File "/home/corbin/.local/lib/python3.8/site-packages/ndn/security/keychain/keychain_sqlite3.py", line 414, in get_signer
    identity = self.default_identity()
  File "/home/corbin/.local/lib/python3.8/site-packages/ndn/security/keychain/keychain_sqlite3.py", line 335, in default_identity
    cursor = self.conn.execute('SELECT id, identity, is_default FROM identities WHERE is_default=1')
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140249821718336 and this is thread id 140249736873728.

Are you unable to use an NDNApp in multiple threads? If not, If I create multiple instances of an 'NDNApp's, will published data from one App satisfy another App's interests?

zjkmxy commented 2 years ago

There can be a problem when you use coroutine with multithreading.

justincpresley commented 2 years ago

I have gotten coroutines to work well with multithreading in the past, but it requires careful coding. It did seem to be the best way to get the best performance too instead of queues. However, I have not heard about the GIL issue though, so that last statement is subject to alter.

I believe the main problem here is the fact that NDNApp is not thread-safe. Will this be a future improvement?

So far, I have encountered two applications (ndn-hydra + nr-archway) that could heavily benefit from / require using NDN-related API calls (i.e. python-ndn) in multiple threads (with coroutines). More are bound to be encountered in the future. I have tried to use queuing for handing off data/commands in one application; however, the performance (considering NDN's current performance) indicts that it was not a viable solution.

zjkmxy commented 2 years ago

I believe the main problem here is the fact that NDNApp is not thread-safe. Will this be a future improvement?

No. Since most python application cannot benefit from multithreading, it is not worthy doing so.

There are multiple articles talking about GIL, such as this. In short, nearly every python statement will acquiring a global shared lock, so python can basically execute only one thread at one time, unless there are some underlying code not written in Python avoiding acquiring that lock (Machine-Learning is that case). You can also try the following code to see that multithread will be even slower unless using a ML library (such as numpy)

import time
from threading import Thread
from multiprocessing import Process

COUNT = 50000000
# 0.5*10^8 -> about 0.5s for a single-threaded C++ program
# But for Python 3, it will need 1~10 seconds
# Better computers can run faster.

def target(l, r):
    ret = 0
    for i in range(l, r):
        ret = ret + i * i  # Unlike C, this will not overflow
    return ret
# To see numpy results, replace it with the following:
# def target(l, r):
#     lst = np.arange(l, r)
#     return np.sum(np.square(lst))

def get_time(func):
    start = time.time()
    func()
    end = time.time()
    print(f'{(end - start):.6f}s')

def single_thread():
    """
    Run target in single thread.
    """
    target(0, COUNT)

def multi_thread():
    """
    Run target in 2 threads.
    """
    half = COUNT // 2
    t1 = Thread(target=target, args=(0, half))
    t2 = Thread(target=target, args=(half, COUNT))
    t1.start()
    t2.start()
    t1.join()
    t2.join()

def multi_process():
    """
    Run target in 2 Python processes, instead of threads.
    """
    half = COUNT // 2
    p1 = Process(target=target, args=(0, half))
    p2 = Process(target=target, args=(half, COUNT))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

if __name__ == '__main__':
    print('Single-threading: ', end='')
    get_time(single_thread)
    print('Multi-threading: ', end='')
    get_time(multi_thread)
    print('Multi-processing: ', end='')
    get_time(multi_process)

So far, I have encountered two applications (ndn-hydra + nr-archway) that could heavily benefit from / require using NDN-related API calls (i.e. python-ndn) in multiple threads (with coroutines). More are bound to be encountered in the future. I have tried to use queuing for handing off data/commands in one application; however, the performance (considering NDN's current performance) indicts that it was not a viable solution.

Another way you can try is create one NDNApp instance per thread, but I don't know what will happen since no one has ever done that. Theoretically it is doable, as NDNApp does not rely on any global variables.

justincpresley commented 2 years ago

For standard python3:

  Single-threading: 2.896443s
  Multi-threading: 2.052587s
  Multi-processing: 1.851165s

For numpy python3:

Single-threading: 0.370442s
Multi-threading: 0.102015s
Multi-processing: 0.113359s

Seems like newer hardware (which I am running on the latest) might help with this problem as referred to, but I can see what you mean. It still would be advantageous to add thread-safe, but I understand its weak acceptance. I appreciate the provided example. Wonder if python is working to alleviate this GIL issue...

Another way you can try is create one NDNApp instance per thread, but I don't know what will happen since no one has ever done that. Theoretically it is doable, as NDNApp does not rely on any global variables.

It does work or at least in the simplest of usage. Tested and confirmed.

zjkmxy commented 2 years ago

Wonder if python is working to alleviate this GIL issue...

GIL has a long long history. I don't think it is easy to fix.

It does work or at least in the simplest of usage. Tested and confirmed.

Great.

zjkmxy commented 2 years ago

FYI: https://docs.python.org/3/library/asyncio-dev.html#concurrency-and-multithreading