Please add the ability to change the default H2O configuration.

Alexey-Makarevich commented 2 years ago

Description

I've made some load testing with Gatling for Typesense in docker - single-instance and 3-node cluster without SSL. For single-instance limit reached at approx 1300rps, for cluster limit reached at aprox 700rps. High CPU usage in system space. There are 2 points:

After limit Gatling throws exception "j.n.ConnectException: connect(..) failed: Cannot assign requested address". It seems that Typesense uses default H2O configuration with max-connections:1024 without any ability to change it. If we could change the configuration of H2O, we could find the optimal configuration.
Cluster was expected to be faster than single-instance but performed quite the opposite. Maybe you can explain why? Can we make some configuration changes to scale rps with clustering?

1node-cpu-system-load 3nodes-cpu-system-load gating-report-3nodes.pdf gatling-report-1node.pdf

Steps to reproduce

Get docker-compose configurations from documentation (variation of docker-compose for swarm). Create collection with 60000 documents. Field "description" fill with 300 random english words from Aspell dictionary (http://app.aspell.net/create). Load with Gatling

    val feeder = csv("aspell-small.csv").random
    val httpProtocol = http
        .baseUrl("http://localhost:8109")
    val headers_1 = Map( "X-TYPESENSE-API-KEY" -> "xyz" )
    val scn = scenario("Typesense search")
        .feed(feeder)
        .exec(
            http("request_1")
                .get("/collections/test-collection/documents/search?q=${dictionaryWord}&query_by=description")
                .headers(headers_1)
        )
    setUp(
        scn.inject(
            incrementUsersPerSec(70)
                .times(20)
                .eachLevelLasting(5.seconds)
                .separatedByRampsLasting(2.seconds)
                .startingFrom(10)
        )
    ).protocols(httpProtocol)

Expected Behavior

CPU load in user space. Most CPU usage for Typesense. No errors "j.n.ConnectException: connect(..) failed: Cannot assign requested address". Adding cluster (nodes) to configuration enlarges rps.

Actual Behavior

CPU load in system space. Most CPU is used by Gatling and Gatling is waiting Typesense and can't get connection to Typesense throwing exceptions "j.n.ConnectException: connect(..) failed: Cannot assign requested address". Adding cluster (nodes) to configuration reduces rps.

Metadata

Typsense Version: 0.23.1

OS: Ubuntu 22.04.1 LTS, Ryzen 7 pro 4750g, 64GB RAM, Samsung SSD 970 EVO Plus, docker 20.10.18.

kishorenc commented 2 years ago

@Alexey-Makarevich

It seems that Typesense uses default H2O configuration with max-connections:1024 without any ability to change it.

As far I can tell, that configuration is only applicable when h2o is used as a direct server using the h2o binary, and not when used as a library. I took a quick look but could not find any such configuration being exposed to the library. Are you sure that you are not running into any other Linux limitations like ulimit etc.?

Cluster was expected to be faster than single-instance but performed quite the opposite.

During searching, each node responds independently (unlike writes), so this should certainly not be an issue. The search code path is also oblivious to the notion of a cluster. One thing you could do is to see if you can run 3 independent single-node Typesense instances and repeat the benchmark to see if it scales as expected.

ghost commented 1 year ago

Description

I've made some load testing with Gatling for Typesense in docker - single-instance and 3-node cluster without SSL. For single-instance limit reached at approx 1300rps, for cluster limit reached at aprox 700rps. High CPU usage in system space. There are 2 points:

After limit Gatling throws exception "j.n.ConnectException: connect(..) failed: Cannot assign requested address". It seems that Typesense uses default H2O configuration with max-connections:1024 without any ability to change it. If we could change the configuration of H2O, we could find the optimal configuration.

Cluster was expected to be faster than single-instance but performed quite the opposite. Maybe you can explain why? Can we make some configuration changes to scale rps with clustering?

gating-report-3nodes.pdf gatling-report-1node.pdf

Steps to reproduce

Get docker-compose configurations from documentation (variation of docker-compose for swarm). Create collection with 60000 documents. Field "description" fill with 300 random english words from Aspell dictionary (http://app.aspell.net/create). Load with Gatling
    val feeder = csv("aspell-small.csv").random
    val httpProtocol = http
        .baseUrl("http://localhost:8109")
    val headers_1 = Map( "X-TYPESENSE-API-KEY" -> "xyz" )
    val scn = scenario("Typesense search")
        .feed(feeder)
        .exec(
            http("request_1")
                .get("/collections/test-collection/documents/search?q=${dictionaryWord}&query_by=description")
                .headers(headers_1)
        )
    setUp(
        scn.inject(
            incrementUsersPerSec(70)
                .times(20)
                .eachLevelLasting(5.seconds)
                .separatedByRampsLasting(2.seconds)
                .startingFrom(10)
        )
    ).protocols(httpProtocol)    
Expected Behavior

CPU load in user space. Most CPU usage for Typesense. No errors "j.n.ConnectException: connect(..) failed: Cannot assign requested address". Adding cluster (nodes) to configuration enlarges rps.

Actual Behavior

CPU load in system space. Most CPU is used by Gatling and Gatling is waiting Typesense and can't get connection to Typesense throwing exceptions "j.n.ConnectException: connect(..) failed: Cannot assign requested address". Adding cluster (nodes) to configuration reduces rps.

Metadata

Typsense Version: 0.23.1

OS: Ubuntu 22.04.1 LTS, Ryzen 7 pro 4750g, 64GB RAM, Samsung SSD 970 EVO Plus, docker 20.10.18.

Hey bro, Can you share a reference to the docker compose file that you used for the cluster test on multiple nodes, I have been trying to perform the same, but my nodes are not able to connect to each other.

typesense / typesense