typesense / typesense-java

Java client for Typesense
https://typesense.org/docs/latest/api/
Apache License 2.0
58 stars 28 forks source link

Node unhealthy not being marked as not healthy #64

Closed asorian0 closed 2 weeks ago

asorian0 commented 1 month ago

Description

Found that there is some scenario where the Java client does not updates the node status. We have a cluster of 3 nodes running typesense-server that are consumed through a single service that holds the typesense java client. This java client is configured to consume all those three nodes as suggested in the documentation. It seems that pulling down one of the nodes, the client is not updating the status about of it, and stills performing requests to that node.

The error thrown in our java app was:

{"date":"2024-09-13T10:35:27.585+02:00","message":"Error searching in collection products","classname":"com.xxx.xxx.pocs.datasource.adapters.TypesenseSearchAdapter","thread":"http-nio-8080-exec-35","level":"ERROR","stacktrace":"java.net.ConnectException: Failed to connect to 172.18.84.11:8108\n\tat okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:297)\n\tat okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:207)\n\tat okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:226)\n\tat okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106)\n\tat okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74)\n\tat okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255)\n\tat okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)\n\tat okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)\...

Checking the code in the client, we found that status is only updated if some kind of errors are happening, but java.net.ConnectException is not one of those.

We suggest to either add java.net.ConnectException to those specific errors that might be handled, or just mark the node as unhealthy on any error.

Steps to reproduce

  1. Get a cluster of at least 3 typesense-server nodes ready
  2. Create a Java app (in our case Spring Boot) and initialize the client:
    
    ArrayList<Node> nodes = new ArrayList<>();
    nodes.add(
    new Node(
    "http",       // For Typesense Cloud use https
    "x.x.x.1",  // For Typesense Cloud use xxx.a1.typesense.net
    "8108"        // For Typesense Cloud use 443
    )
    );
    nodes.add(
    new Node(
    "http",       // For Typesense Cloud use https
    "x.x.x.2",  // For Typesense Cloud use xxx.a1.typesense.net
    "8108"        // For Typesense Cloud use 443
    )
    );
    nodes.add(
    new Node(
    "http",       // For Typesense Cloud use https
    "x.x.x.3",  // For Typesense Cloud use xxx.a1.typesense.net
    "8108"        // For Typesense Cloud use 443
    )
    );

Configuration configuration = new Configuration(nodes, Duration.ofSeconds(2),"");

Client client = new Client(configuration);

3. Write a loop that performs requests to typesense nodes for a while, ie:

for (int i = 0; i < 50000; i++) { try { SearchParameters searchParameters = new SearchParameters(); // add search params based on your index client.search(searchParameters); } catch (Exception e) { e.printStackTrace(); } }


4. Run the application
5. While the requests are being performed to typesense nodes, pull down one of those nodes (only one)
6. You'll get a bunch of errors for all those requests attached to that node

## Expected Behavior

Node might be marked as unhealthy

## Actual Behavior

Node is not marked as unhealthy because the kind of error is not being handled properly

## Metadata

**Typesense Version**: 0.26

**OS**: RHEL for cluster nodes, alpine for java app
kishorenc commented 2 weeks ago

Fixed in v1.0.0 release.