jamesturk / jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
https://jamesturk.github.io/jellyfish/
MIT License
2.05k stars 160 forks source link

Release GIL to allow parallelism, achieving huge performance boost in multithreaded applications #216

Open bunny-therapist opened 3 weeks ago

bunny-therapist commented 3 weeks ago

The pyfunctions in this package keep the GIL while running rust code, and therefore does not support parallelism. Making them support parallelism appears to be quite straightforward.

My starting point for this was this page and this Q&A which led me to trying the allow_threads method on the jellyfish pyfunctions.

For example, let's consider the levenshtein_distance function:

#[pyfunction]
fn levenshtein_distance(a: &str, b: &str) -> PyResult<usize> {
    Ok(_lev(a, b))
}

If python is running two threads and they each call the levenshtein distance function, one of the threads can be calculating the levenshtein distance at a time due to threads holding the GIL even while running rust code.

If the function instead looked like

#[pyfunction]
fn levenshtein_distance(py: Python<'_>, a: &str, b: &str) -> PyResult<usize> {
    py.allow_threads(|| Ok(_lev(a, b)))
}

then both python threads could be running the function at the same time, resulting in parallelism and a huge performance boost for multithreaded applications.

I wrote some code in order to test this; see below. My takeaways are:

Given that the functions in rustyfish.rs only get immutable references to strings (and strings in python are immutable), I don't expect this change to cause any problems (if there were any strange pitfalls related to garbage collection, I believe this page would have mentioned it - their example code is actually very similar to the rustyfish functions in their signatures).

If the maintainer is interested, I would be willing to submit a pull request which adds py.allow_threads to the pyfunctions in rustyfish.rs.

(I originally got into this topic because we are running a multithreaded python application which indirectly - through yake - is using the levenshtein distance function from jellyfish, and we have trouble with performance.)

My test code:

from jellyfish import levenshtein_distance
import threading
import time

TEST_STRING_A = "banana" * 1000
TEST_STRING_B = "apple" * 1000

def run_function():
    t0 = time.monotonic()
    levenshtein_distance(TEST_STRING_A, TEST_STRING_B)
    t1 = time.monotonic()
    print(f"dt = {t1 - t0}")

    # Ran it 10 times

    # without py.allow_threads:
    #   mean: 4.215436112299358 s
    #   stddev: 0.035588046203610106 s
    #   readable: 4.22(3) s

    # with py.allow_threads:
    #   mean: 4.309518244700302 s
    #   stddev: 0.04465694752509274 s
    #   readable: 4.31(4) s

    # => 2 % slower with py.allow_threads

def run_threads():
    threads = [
        threading.Thread(target=levenshtein_distance, args=(TEST_STRING_A, TEST_STRING_B)),
        threading.Thread(target=levenshtein_distance, args=(TEST_STRING_A, TEST_STRING_B)),
    ]
    t0 = time.monotonic()
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()
    t1 = time.monotonic()
    print(f"dt = {t1 - t0}")

    # without py.allow_threads:
    # - 8.327463528999942 s
    # - 8.305032594002114 s
    # - 8.284553416997369 s

    # with py.allow_threads:
    # - 4.47674531499797 s
    # - 4.499640865997208 s
    # - 4.687757796000369 s

    # => we get parallelism with py.allow_threads
jamesturk commented 3 weeks ago

Thank you for this thorough issue -- I'll need to familiarize myself a bit with allow_threads here to make sure there aren't any unintended consequences I'm not familiar with, but generally it sounds good to me (agree on tradeoff) and I'd be happy to take a PR.

bunny-therapist commented 3 weeks ago

I also tested non-threaded performance decrease for short threads (same code, same way).

The relative extra time it took was more significant, at abut 10%, or 1 microsecond. It makes sense it would be worse since I would expect py.allow_threads to take a certain time regardless of input. However, for the long strings it took 94 ms extra, which is more; I have no idea why the extra time would be increased by the string size - maybe it is just variance? The point is, py.allow_threads may be negligible compared to giant input but maybe not for small?

I think we need to be careful about the impact to performance for typical input. At least in our application, the inputs are usually quite small (single words). If this is typical, then it may be that the performance overhead becomes too significant to be worth it.

Specifics:

TEST_STRING_A = "banana"
TEST_STRING_B = "apple"

# Ran 10000 times
# without py.allow_threads:
#   mean:   1.0809695001967157e-05
#   stddev: 1.6537346361142648e-06
# with py.allow_threads:
#   mean:   1.1951207000379328e-05
#   stddev: 2.519401095876592e-06
# => +1 microsecond, or 10% slower
bunny-therapist commented 3 weeks ago

Actually, for our own application, it seems like performance is the same or slightly worse with py.allow_threads. I assume this is because it is called with very small input, so the overhead becomes significant, and the time spent in the rust code is too short to provide any significant parallelism.

jamesturk commented 3 weeks ago

Ah that makes sense, thanks for investigating it.

On Wed, Sep 25, 2024, at 3:53 AM, bunny-therapist wrote:

Actually, for our own application, it seems like performance is the same or slightly worse with py.allow_threads. I assume this is because it is called with very small input, so the overhead becomes significant, and the time spent in the rust code is too short to provide any significant parallelism.

— Reply to this email directly, view it on GitHub https://github.com/jamesturk/jellyfish/issues/216#issuecomment-2373466988, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAB6YTEDFDE2EXDCXP5DUDZYJ2XBAVCNFSM6AAAAABOYSZGBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZTGQ3DMOJYHA. You are receiving this because you commented.Message ID: @.***>