Use `pytest-benchmark` to run a better benchmark

prrao87 commented 11 months ago

Hi @sanders41, I've been using pytest-benchmark lately to run better benchmarks for Python functions on databases, and it depends on pytest underneath. I've also checked out rich-bench, but it's not as feature-rich as pytest-benchmark, specifically lacking these two features:

Running with x warmup cycles before recording timing numbers: this is important when running DB benchmarks because many of them involve optimizations when you have a warm cache to perform a fairer comparison - pytest-benchmark offers this as an added argument
Disabling Python's GC timing in the final timing number reported: this is again possible in pytest-benchmark by just enabling a flag

I'm not sure if you've found other better libraries out there to do this, but the main reason I think it matters is that time.time() isn't as good as time.perf_counter(), which is the default system clock used to measure these timings in pytest-benchmark, and I find that it's quite robust over a range of settings.

Was wondering on your thoughts on this, and if you were looking to accept a PR!

sanders41 commented 11 months ago

I just came across rich-bench and starred it to come back and look at it, but haven’t had a chance yet.

pytest-benchmark looks interesting. I have some questions about it.

Is it possible to only run the benchmark tests with a flag? I suppose the tests could be marked as benchmarks and do it this way. Basically I don’t want to add time to the test suit for benchmarking on every run.
It says it can only benchmark one function, how are you handling this since several functions are used to set things up?
It looks like it is using xdist, is this required? I have actually been trying to add xdist to the test suite for a while now without luck. Test pollution has been a big issue when running in parallel. For tests that look at a single index this is relatively easy to handle, the issue is some tests need to look at all indexes, tasks, etc. I am getting close to figuring that out, but think I have some kind of race condition causing Meilisearch to error that I don’t fully understand yet.

For the GC I’m wondering, could it be important to leave that on? Let’s say you are doing something in the code that causes a lot of cycles therefore triggering a lot of GC runs making it slow. If it was possible to not create these cycles by changing the code you could speed thing up, but you wouldn’t know this if GC was disabled. I’m just thinking off the top of my head here, and don’t even have code in mind as an example, so it may not make sense to even consider this.

prrao87 commented 11 months ago

Per my understanding, pytest-benchmark only depends on pytest under the hood, but doesn't run as part of the test suite by default. You'd manually call the benchmark script via pytest benchmark.py --args and this would only run the benchmark. The docs aren't very descriptive, but I did use it to test and benchmark many functions, see this example. The design of the benchmark functions is very similar toregular pytest, you only mark the benchmark functions with the test_ prefix for the benchmark suite to pick it up.

You raise some very good points about things like concurrent tests (pytest itself isn't great with this), and I'm not sure about the implications of using a library dependent on pytest when you're using asyncio -- marking it as async like we do in regular pytest may not be as straightforward. For my benchmarks, I suppose I got away because I treated everything as a blocking function and ran the tests sequentially. I'd have to think about concurrency.

As for GC, I agree, by default it makes sense to leave GC enabled for the benchmarks you're considering.

sanders41 commented 11 months ago

OK this all sounds good to me, I’d definitely merge a PR for this. We may want to wait until after I finish with #687 because it will be a breaking change. I believe I have the code done, I’m just working on finalizing documentation.

Once it is done I would like to benchmark both clients against the official client. There should be no change in async performance. I’m curious to see the performance of the synchronous client vs the official client. My expectation is performance between the two will be roughly equal, but I’d like to verify that.

If you are interested and have time it would be nice to set this up on the sync_client branch. I was planning on updating the current benchmarking there to verify my expectations before merging anyway. If you don’t have time no worries.

prrao87 commented 11 months ago

Awesome, I'll follow along anyway and see your progress on these branches/issues! And will be in touch if I have time.

prrao87 commented 11 months ago

@sanders41 Here's a very nice snippet that Neo4j's async client maintainer used (source).

@pytest_asyncio.fixture
async def aio_benchmark(benchmark, event_loop):
    def _wrapper(func, *args, **kwargs):
        if asyncio.iscoroutinefunction(func):
            @benchmark
            def _():
                return event_loop.run_until_complete(func(*args, **kwargs))
        else:
            benchmark(func, *args, **kwargs)

    return _wrapper

Usage:

async def some_async_function_to_test(some_async_fixture):
    ...

def test_benchmark_please(some_async_fixture, aio_benchmark):
    aio_benchmark(some_async_function_to_test, some_async_fixture)

prrao87 commented 11 months ago

The full code is in conftest.py, in their repo. Not very clear that they're running the pytest test suite independently from the benchmark tho, the whole thing seems to be run in a single test suite.

sanders41 commented 11 months ago

For asyncio itself I think we will be fine because I'm using pytest-asyncio which basically does the same thing as this wrapper (assuming the benchmark plugin picks up on other pytest plugins). The issues I have seen specifically come in when using xdist because it runs multiple tests all at the same time.

xdist may actaully be OK in this situation because we aren't running the full test suite, only the benchmarks. What we can do here is instead of using movies as the index name we can use str(uuid4()). This will guarontee each index has a unique name. Since we don't care what that name is and each run only needs to access one index along with the index specific task this solves the issue of multiple processes bumping into each other.

I ran the current benchmarks with the new code and as expected I saw no difference in the async client performance. I did see slightly better performance from my sync client than the offical client. This surprised me, as I expected the performance to be the same, and I can't explain why. I expect the differnce is just by chance, though I did see the difference over multiple runs.

Since performance in the async client came out as expected I am going to go forward with the PR. meilisearch-python-async won't really make sense as a name any more so I will probably end up changing the name also.

prrao87 commented 11 months ago

Awesome, sounds like a plan 😎. So you'll be maintaining both the sync and async Python clients yourself? Or would this be something the Meilisearch team will take over from you long term?

sanders41 commented 11 months ago

I will continue maintaining this package myself. I am also a maintainer (though not a Meilisearch employee) on the official Meilisearch Python client, and will continue with that.

I have seen people that use both this async package and the official package. The APIs between these two packages are slightly different so by providing both options here these people can work with the exact same API in their different applications, the only difference being if they await or not.

sanders41 commented 11 months ago

@prrao87 the release of the version 2 client is done now. I left the old style benchmarking because I wanted you to be able to get credit for your idea if you want to implement it. If not no pressure.

prrao87 commented 11 months ago

Hey @sanders41 looks good, and the suffix sdk makes it clearly distinguishable from the client built by Meili! Were you in the process of taking a stab at benchmarking, perhaps with rich-bench or any other package? I haven't had the time to look into it and will be a bit occupied for the next few days, but if you're also caught up with other priorities, we can keep this open and I can take a stab at it next week.

sanders41 commented 11 months ago

I have been pretty busy lately also and haven't had the chance to try out any of the other benchmark packages yet.

prrao87 commented 11 months ago

Sounds good, let's revisit next week then!

prrao87 commented 10 months ago

Hi @sanders41 I've tried writing the pytest-benchmark tests for the use case we looked at, and it's not trivial. Haven't had luck getting the async benchmarks to run (also not too clear what to test, as there's no "right" answer from the search). The way async testing works in conjunction with this library isn't super clear, and the documentation hasn't helped me much so far. I don't have too much bandwidth to take this on, so I'm hoping you do.

Otherwise, I think we can close this issue and focus on bigger priorities. Thanks!

sanders41 commented 10 months ago

Ok, I'll take a look and see if I can come up with a good way to do it.

07pepa commented 1 week ago

if i may chime in why not use https://codspeed.io it is used by pydantic

sanders41 commented 1 week ago

codspeed is something I have wanted to look at, but just haven't had time. The tricky part here would be with it running in CI. Slow downs would need to be looked at further to see if it is code under our control that caused the slow down, and not network issues, or Meilisearch itself that slowed things down.

What may make sense is to add it to the nightly tests instead of a standard part of CI. Possibly with an option to also trigger a run manually. This way if we are specifically working on speed in code we control we can kick it off to test how we are doing.

sanders41 / meilisearch-python-sdk

Use `pytest-benchmark` to run a better benchmark #691

Usage: