milvus-io / milvus-sdk-java

Java SDK for Milvus.
https://milvus.io
Apache License 2.0
366 stars 150 forks source link

enhance: add client interceptor support #900

Closed madogar closed 3 weeks ago

sre-ci-robot commented 2 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: madogar To complete the pull request process, please assign yhmo after the PR has been reviewed. You can assign the PR to them by writing /assign @yhmo in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/milvus-io/milvus-sdk-java/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
sre-ci-robot commented 2 months ago

Welcome @madogar! It looks like this is your first PR to milvus-io/milvus-sdk-java 🎉

yhmo commented 2 months ago

@madogar We don't intend to expose direct interceptor input to users.

Java SDK follows the python sdk logic. In python sdk, the interceptor is used to pass db_name and authorization. https://github.com/milvus-io/pymilvus/blob/3198bd2614446cd3b4266751394d6d16dcb6a8ef/pymilvus/client/grpc_handler.py#L236 https://github.com/milvus-io/pymilvus/blob/3198bd2614446cd3b4266751394d6d16dcb6a8ef/pymilvus/client/grpc_handler.py#L239

The python sdk doesn't provide direct interceptor for users, either. We only expose parameters that we can support.

madogar commented 2 months ago

@yhmo My bad, forgot to set the context. Here is the server side change to set client_request_id in logs: https://github.com/milvus-io/milvus/pull/32264

As of today there is no way to set the param from java SDK, hence this change where would implement an interceptor like below and set it at client level: `public class TraceIdClientInterceptor implements ClientInterceptor { @Override public <ReqT, RespT> ClientCall<ReqT, RespT> interceptCall(MethodDescriptor<ReqT, RespT> method, CallOptions callOptions, Channel next) {

    return new ForwardingClientCall
        .SimpleForwardingClientCall<ReqT, RespT>(next.newCall(method, callOptions)) {
        @Override
        public void start(ClientCall.Listener<RespT> responseListener, Metadata headers) {
            headers.put(Metadata.Key.of("client_request_id", Metadata.ASCII_STRING_MARSHALLER), "999-" + UUID.randomUUID().toString());
            super.start(responseListener, headers);
        }
    };
}

}`

The biggest challenge is setting this (unique value)param for every gRPC call.

yhmo commented 2 months ago

@madogar This pr is to set a client ID for the connection, and pass the client ID by ClientInterceptor. In my opinion, I mean no need to expose ClientInterceptor to users. Just like the database name:

if (StringUtils.isNotEmpty(connectParam.getDatabaseName())) {
            metadata.put(Metadata.Key.of("dbname", Metadata.ASCII_STRING_MARSHALLER), connectParam.getDatabaseName());
}

List<ClientInterceptor> clientInterceptors = new ArrayList<>();
clientInterceptors.add(MetadataUtils.newAttachHeadersInterceptor(metadata));

We can add a method to the ConnectParam like this:

if (StringUtils.isNotEmpty(connectParam.getClientID())) {
            metadata.put(Metadata.Key.of("client_request_id", Metadata.ASCII_STRING_MARSHALLER), connectParam.getClientID());
}

List<ClientInterceptor> clientInterceptors = new ArrayList<>();
clientInterceptors.add(MetadataUtils.newAttachHeadersInterceptor(metadata));

The ConnectParam.withClientInterceptors() directly exposes the grpc class ClientInterceptor to users, which is not a good design. It might be out of control if users pass ClientInterceptor freely to the server.

Just let the user input an ID by ConnectParam.withClientID(String cliecntID), and internally encapsulate the ID with a ClientInterceptor.

prrs commented 1 month ago

@yhmo My thought, consider this scenario:

  1. transactionId is present in gRPC context 2. when we make a call to Milvus using Milvus client SDK we, we want to pass this implicitly

If we don't do this and follow the approach to explicitly pass it, multiple services has to intercept it and pass it explicitly. I think there could be scenarios where client wants to pass it explicitly but in most of enterprise scenarios there is common layer which takes care of intercepting transactionId to keep it implicit.

cc: @xiaofan-luan

yhmo commented 1 month ago

@prrs I just discussed with @xiaofan-luan about this pr. He agrees with me that it is not a good design to expose the grpc.ClientInterceptor to sdk users. The input of the grpc interceptor should be controlled.

prrs commented 1 month ago

@yhmo @xiaofan-luan We need unique transaction ID per API call, so it can't be at connection level. I am dropping a suggestion. Please help to conclude this or suggest a solution which can solve the problem in hand. Thanks.

My thought:

Allowing the Client SDK to Intercept and Use Transaction ID Pros: Ease of Use: Clients do not have to manage transaction IDs, making the SDK simpler and more user-friendly. Consistency: The SDK can ensure that transaction IDs are generated and used consistently across all requests. Reduced Errors: There is less chance of errors related to transaction ID management by the client. Cons: Lack of Control: Clients have less control over transaction IDs, which might be necessary for advanced use cases. Transparency: It might be less transparent to clients how transaction IDs are being managed.

Making it Explicit for Client to Pass Transaction ID Pros: Control: Clients have full control over the transaction IDs, which can be beneficial for custom transaction management and tracking. Transparency: Clients are fully aware of how transaction IDs are being used and managed. Flexibility: Clients can implement their own logic for generating and managing transaction IDs, which might be necessary for complex applications. Cons: Complexity: This approach adds complexity for clients, who must manage transaction IDs themselves. Potential for Errors: Clients might make mistakes in generating or using transaction IDs, leading to potential issues. Recommendations Hybrid Approach: One way to balance these pros and cons is to implement a hybrid approach:

Default Behavior: By default, the SDK can generate and manage transaction IDs automatically. This makes it easy for most users who do not need to manage transaction IDs themselves. Advanced Option: Provide an option for clients to explicitly pass their own transaction IDs if they need to. This can be done through additional parameters or configuration options.

yhmo commented 1 month ago

Does the transaction ID generator rely on the interceptCall() of ClientInterceptor? I mean: is the transaction ID generated inside the interceptCall()?

madogar commented 1 month ago

Does the transaction ID generator rely on the interceptCall() of ClientInterceptor? I mean: is the transaction ID generated inside the interceptCall()?

@yhmo no the transaction id(trace id) wouldn't be generated inside the interceptCall(). Eventually in our JAVA app we would create a thread local variable whose value would be used in the interceptCall() for every grpc call. With this logic we would make sure every thread making a grpc call would set its own transaction id.

PS: In the above example in the thread I might have hardcoded the value in the interceptCall() for testing reason.

yhmo commented 1 month ago

Could you show me an example of how your app achieves "every thread making a grpc call would set its own transaction id" by the interceptCall()?

madogar commented 1 month ago

Could you show me an example of how your app achieves "every thread making a grpc call would set its own transaction id" by the interceptCall()?

Here is the code where we set the traceId/transactionId in thread local in the Application class:

public static final ThreadLocal<String> traceIdContext = new ThreadLocal<>();   
     public void setTraceIdContext(String traceId) {
        traceIdContext.set(traceId);
    }

Here is how we would use it in intercept call:

public <ReqT, RespT> ClientCall<ReqT, RespT> interceptCall(MethodDescriptor<ReqT, RespT> method, CallOptions callOptions, Channel next) {
         return new ForwardingClientCall
            .SimpleForwardingClientCall<ReqT, RespT>(next.newCall(method, callOptions)) {
            @Override
            public void start(ClientCall.Listener<RespT> responseListener, Metadata headers) {
                headers.put(Metadata.Key.of("client_request_id", Metadata.ASCII_STRING_MARSHALLER), Application.traceIdContext.get());
                super.start(responseListener, headers);
            }
        };
    }
yhmo commented 1 month ago

Ok, I see. traceid is not a const value, it is unique for each thread. Concurrent threads use the same MilvusClient to call rpc interfaces, different threads have different traceid, right?

madogar commented 1 month ago

Ok, I see. traceid is not a const value, it is unique for each thread. Concurrent threads use the same MilvusClient to call rpc interfaces, different threads have different traceid, right?

yes! so, are you now aligned with client interceptor logic?

yhmo commented 1 month ago

I didn't find a workaround to avoid exposing grpc interceptor, it seems your pr is the only solution to handle the use case.

@xiaofan-luan I think I can accept this pr. What do you think?

yhmo commented 1 month ago

@madogar We still insist that the ClientInteceptor should be hidden inside the ConnectParam. This is our proposal: Hide the constructor of ClientInterceptor inside the ConnectParam, users provide a ThreadLocal object to pass the trace ID. The meta key "client_request_id" should be consistent with the server side, users no need to know this meta key.

public class ConnectParam {
    ......
    private final List<ClientInterceptor> clientInterceptors;
    protected ConnectParam(@NonNull Builder builder) {
        ......
        this.clientInterceptors = builder.clientInterceptors;
    }

    public static Builder newBuilder() {
        private List<ClientInterceptor> clientInterceptors = new ArrayList<>();

        public Builder withTraceID(@NonNull ThreadLocal<String> traceIdContext) {
            clientInterceptors.add(new ClientInterceptor() {
            @Override
            public <ReqT, RespT> ClientCall<ReqT, RespT> interceptCall(MethodDescriptor<ReqT, RespT> method, CallOptions callOptions, Channel next) {
                return new ForwardingClientCall.SimpleForwardingClientCall<ReqT, RespT>(next.newCall(method, callOptions)) {
                    @Override
                    public void start(ClientCall.Listener<RespT> responseListener, Metadata headers) {
                        headers.put(Metadata.Key.of("client_request_id", Metadata.ASCII_STRING_MARSHALLER), traceIdContext.get());
                        super.start(responseListener, headers);
            }
        };
            }
        });
            return this;
        }
    }
}
madogar commented 3 weeks ago

@yhmo @xiaofan-luan I have raised a new PR with the changes we aligned, please review: https://github.com/milvus-io/milvus-sdk-java/pull/955

yhmo commented 3 weeks ago

close this thread and switch to #955