Open prrao87 opened 11 months ago
I just came across rich-bench
and starred it to come back and look at it, but haven’t had a chance yet.
pytest-benchmark
looks interesting. I have some questions about it.
For the GC I’m wondering, could it be important to leave that on? Let’s say you are doing something in the code that causes a lot of cycles therefore triggering a lot of GC runs making it slow. If it was possible to not create these cycles by changing the code you could speed thing up, but you wouldn’t know this if GC was disabled. I’m just thinking off the top of my head here, and don’t even have code in mind as an example, so it may not make sense to even consider this.
Per my understanding, pytest-benchmark
only depends on pytest
under the hood, but doesn't run as part of the test suite by default. You'd manually call the benchmark script via pytest benchmark.py --args
and this would only run the benchmark. The docs aren't very descriptive, but I did use it to test and benchmark many functions, see this example. The design of the benchmark functions is very similar toregular pytest
, you only mark the benchmark functions with the test_
prefix for the benchmark suite to pick it up.
You raise some very good points about things like concurrent tests (pytest
itself isn't great with this), and I'm not sure about the implications of using a library dependent on pytest
when you're using asyncio
-- marking it as async like we do in regular pytest
may not be as straightforward. For my benchmarks, I suppose I got away because I treated everything as a blocking function and ran the tests sequentially. I'd have to think about concurrency.
As for GC, I agree, by default it makes sense to leave GC enabled for the benchmarks you're considering.
OK this all sounds good to me, I’d definitely merge a PR for this. We may want to wait until after I finish with #687 because it will be a breaking change. I believe I have the code done, I’m just working on finalizing documentation.
Once it is done I would like to benchmark both clients against the official client. There should be no change in async performance. I’m curious to see the performance of the synchronous client vs the official client. My expectation is performance between the two will be roughly equal, but I’d like to verify that.
If you are interested and have time it would be nice to set this up on the sync_client
branch. I was planning on updating the current benchmarking there to verify my expectations before merging anyway. If you don’t have time no worries.
Awesome, I'll follow along anyway and see your progress on these branches/issues! And will be in touch if I have time.
@sanders41 Here's a very nice snippet that Neo4j's async client maintainer used (source).
@pytest_asyncio.fixture
async def aio_benchmark(benchmark, event_loop):
def _wrapper(func, *args, **kwargs):
if asyncio.iscoroutinefunction(func):
@benchmark
def _():
return event_loop.run_until_complete(func(*args, **kwargs))
else:
benchmark(func, *args, **kwargs)
return _wrapper
async def some_async_function_to_test(some_async_fixture):
...
def test_benchmark_please(some_async_fixture, aio_benchmark):
aio_benchmark(some_async_function_to_test, some_async_fixture)
The full code is in conftest.py
, in their repo. Not very clear that they're running the pytest
test suite independently from the benchmark tho, the whole thing seems to be run in a single test suite.
For asyncio itself I think we will be fine because I'm using pytest-asyncio
which basically does the same thing as this wrapper (assuming the benchmark plugin picks up on other pytest plugins). The issues I have seen specifically come in when using xdist
because it runs multiple tests all at the same time.
xdist
may actaully be OK in this situation because we aren't running the full test suite, only the benchmarks. What we can do here is instead of using movies
as the index name we can use str(uuid4())
. This will guarontee each index has a unique name. Since we don't care what that name is and each run only needs to access one index along with the index specific task this solves the issue of multiple processes bumping into each other.
I ran the current benchmarks with the new code and as expected I saw no difference in the async client performance. I did see slightly better performance from my sync client than the offical client. This surprised me, as I expected the performance to be the same, and I can't explain why. I expect the differnce is just by chance, though I did see the difference over multiple runs.
Since performance in the async client came out as expected I am going to go forward with the PR. meilisearch-python-async
won't really make sense as a name any more so I will probably end up changing the name also.
Awesome, sounds like a plan 😎. So you'll be maintaining both the sync and async Python clients yourself? Or would this be something the Meilisearch team will take over from you long term?
I will continue maintaining this package myself. I am also a maintainer (though not a Meilisearch employee) on the official Meilisearch Python client, and will continue with that.
I have seen people that use both this async package and the official package. The APIs between these two packages are slightly different so by providing both options here these people can work with the exact same API in their different applications, the only difference being if they await or not.
@prrao87 the release of the version 2 client is done now. I left the old style benchmarking because I wanted you to be able to get credit for your idea if you want to implement it. If not no pressure.
Hey @sanders41 looks good, and the suffix sdk
makes it clearly distinguishable from the client built by Meili! Were you in the process of taking a stab at benchmarking, perhaps with rich-bench
or any other package? I haven't had the time to look into it and will be a bit occupied for the next few days, but if you're also caught up with other priorities, we can keep this open and I can take a stab at it next week.
I have been pretty busy lately also and haven't had the chance to try out any of the other benchmark packages yet.
Sounds good, let's revisit next week then!
Hi @sanders41 I've tried writing the pytest-benchmark
tests for the use case we looked at, and it's not trivial. Haven't had luck getting the async benchmarks to run (also not too clear what to test, as there's no "right" answer from the search). The way async testing works in conjunction with this library isn't super clear, and the documentation hasn't helped me much so far. I don't have too much bandwidth to take this on, so I'm hoping you do.
Otherwise, I think we can close this issue and focus on bigger priorities. Thanks!
Ok, I'll take a look and see if I can come up with a good way to do it.
if i may chime in why not use https://codspeed.io it is used by pydantic
codspeed is something I have wanted to look at, but just haven't had time. The tricky part here would be with it running in CI. Slow downs would need to be looked at further to see if it is code under our control that caused the slow down, and not network issues, or Meilisearch itself that slowed things down.
What may make sense is to add it to the nightly tests instead of a standard part of CI. Possibly with an option to also trigger a run manually. This way if we are specifically working on speed in code we control we can kick it off to test how we are doing.
Hi @sanders41, I've been using
pytest-benchmark
lately to run better benchmarks for Python functions on databases, and it depends onpytest
underneath. I've also checked outrich-bench
, but it's not as feature-rich aspytest-benchmark
, specifically lacking these two features:x
warmup cycles before recording timing numbers: this is important when running DB benchmarks because many of them involve optimizations when you have a warm cache to perform a fairer comparison -pytest-benchmark
offers this as an added argumentpytest-benchmark
by just enabling a flagI'm not sure if you've found other better libraries out there to do this, but the main reason I think it matters is that
time.time()
isn't as good astime.perf_counter()
, which is the default system clock used to measure these timings inpytest-benchmark
, and I find that it's quite robust over a range of settings.Was wondering on your thoughts on this, and if you were looking to accept a PR!