Open bkarsin opened 3 months ago
I looked further into this, and it seems to be an issue with performance dropping when using >32 threads, even on a system with >32 cores. On the AMD platform listed above, I measured search performance when varying number of threads, both with and without using a filter. Below are graphs of the QPS and 99.9% latency reported by search_memory_index
:
Hey @bkarsin this is an interesting result. Not that it should matter - i'm curious if you are running this on bare metal hw?
Running on a cluster with an interactive slurm job and a docker container. Can give more details on the docker image and other library versions if needed.
Expected Behavior
Benchmarked search performance on two CPU platforms on my dataset with filtered search: Platform A: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz - 12 cores Platform B: Dual-socket AMD Epyc 7742 - 128 cores (2x 64)
Expect Platform B to outperform Platform A, especially in terms of QPS, given the same index and search parameters.
Actual Behavior
Getting significantly worse search performance on Platform B (though build times are much faster). The 99.9 Latency is also very high for Platform B. Below is example performance results for the same index and search parameters (more details in error section).
Platform A:
Platform B:
Example Code
No custom code since this is just from running
build_memory_index
andsearch_memory_index
on my dataset for two platforms. The parameters used to build the index and search are seen in the full error logs below.Dataset Description
Please tell us about the shape and datatype of your data, (e.g. 128 dimensions, 12.3 billion points, floats)
Error
Platform A:
Platform B:
Your Environment
Platform A and B are using the same docker container with the following software versions:
Additional Details
Any other contextual information you might feel is important.