pinecone-io / pinecone-java-client

The official Java client for the Pinecone vector database
https://www.pinecone.io
Apache License 2.0
36 stars 13 forks source link

Add proxy configuration for OkHTTPClient and NettyChannelBuilder #136

Closed rohanshah18 closed 4 months ago

rohanshah18 commented 4 months ago

Problem

In order to configure a proxy:

  1. For data plane operations, users have to first instantiate the PineconeConfig class followed by defining the custom NettyChannelBuilder and building the managedChannel to configure the proxy. And finally, after setting the customManagedChannel in PineconeConfig, PineConnection and Index/AsyncIndex classes can be instantiated.
  2. For control plane operations, users have to define the OkHttpClient, configure the proxy, and set it to the Pinecone builder object.

Solution

Provide ability to configure proxies using proxyHost and proxyPort for both control plane (OkHttpClient) and data plane (gRPC calls via NettyChannelBuilder) operations without having the need to instantiate all of the classes as shown in the example below.

  1. Data Plane operations via NettyChannelBuilder:

Before:

import io.grpc.HttpConnectProxiedSocketAddress;
import io.grpc.ManagedChannel;
import io.grpc.ProxiedSocketAddress;
import io.grpc.ProxyDetector;
import io.pinecone.clients.Index;
import io.pinecone.configs.PineconeConfig;
import io.pinecone.configs.PineconeConnection;
import io.grpc.netty.GrpcSslContexts;
import io.grpc.netty.NegotiationType;
import io.grpc.netty.NettyChannelBuilder;
import io.pinecone.exceptions.PineconeException;

import javax.net.ssl.SSLException;
import java.net.InetSocketAddress;
import java.net.SocketAddress;
import java.util.concurrent.TimeUnit;

import java.util.Arrays;

...
        String apiKey = System.getenv("PINECONE_API_KEY");
        String proxyHost = System.getenv("PROXY_HOST");
        int proxyPort = Integer.parseInt(System.getenv("PROXY_PORT"));

        PineconeConfig config = new PineconeConfig(apiKey);
        String endpoint = System.getenv("PINECONE_HOST");
        NettyChannelBuilder builder = NettyChannelBuilder.forTarget(endpoint);

        ProxyDetector proxyDetector = new ProxyDetector() {
            @Override
            public ProxiedSocketAddress proxyFor(SocketAddress targetServerAddress) {
                SocketAddress proxyAddress = new InetSocketAddress(proxyHost, proxyPort);

                return HttpConnectProxiedSocketAddress.newBuilder()
                        .setTargetAddress((InetSocketAddress) targetServerAddress)
                        .setProxyAddress(proxyAddress)
                        .build();
            }
        };

        // Create custom channel
        try {
            builder = builder.overrideAuthority(endpoint)
                    .negotiationType(NegotiationType.TLS)
                    .keepAliveTimeout(5, TimeUnit.SECONDS)
                    .sslContext(GrpcSslContexts.forClient().build())
                    .proxyDetector(proxyDetector);
        } catch (SSLException e) {
            throw new PineconeException("SSL error opening gRPC channel", e);
        }

        // Build the managed channel with the configured options
        ManagedChannel channel = builder.build();
        config.setCustomManagedChannel(channel);
        PineconeConnection connection = new PineconeConnection(config);
        Index index = new Index(connection, "PINECONE_INDEX_NAME");
        // Data plane operations
        // 1. Upsert data
        System.out.println(index.upsert("v1", Arrays.asList(1F, 2F, 3F, 4F)));
        // 2. Describe index stats
        System.out.println(index.describeIndexStats());

After:

import io.pinecone.clients.Index;
import io.pinecone.clients.Pinecone;

...
        String apiKey = System.getenv("PINECONE_API_KEY");
        String proxyHost = System.getenv("PROXY_HOST");
        int proxyPort = Integer.parseInt(System.getenv("PROXY_PORT"));

        Pinecone pinecone = new Pinecone.Builder(apiKey)
                .withProxy(proxyHost, proxyPort)
                .build();

        Index index = pinecone.getIndexConnection("PINECONE_INDEX_NAME");
        // Data plane operation routed through the proxy server
        // 1. Upsert data
        System.out.println(index.upsert("v1", Arrays.asList(1F, 2F, 3F, 4F)));
        // 2. Describe index stats
        System.out.println(index.describeIndexStats());
  1. Control Plane operations via OkHttpClient:

Before:

import io.pinecone.clients.Pinecone;
import okhttp3.OkHttpClient;
import java.net.InetSocketAddress;
import java.net.Proxy;

...
        String apiKey = System.getenv("PINECONE_API_KEY");
        String proxyHost = System.getenv("PROXY_HOST");
        int proxyPort = Integer.parseInt(System.getenv("PROXY_PORT"));

        // Instantiate OkHttpClient instance and configure the proxy
        OkHttpClient client = new OkHttpClient.Builder()
                .proxy(new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyHost, proxyPort)))
                .build();

        // Instantiate Pinecone class with the custom OkHttpClient object
        Pinecone pinecone = new Pinecone.Builder(apiKey)
                .withOkHttpClient(client)
                .build();

        // Control plane operation routed through the proxy server
        System.out.println(pinecone.describeIndex("PINECONE_INDEX"));

After:

import io.pinecone.clients.Pinecone;

...
        String apiKey = System.getenv("PINECONE_API_KEY");
        String proxyHost = System.getenv("PROXY_HOST");
        int proxyPort = Integer.parseInt(System.getenv("PROXY_PORT"));

        Pinecone pinecone = new Pinecone.Builder(apiKey)
                .withProxy(proxyHost, proxyPort)
                .build();

        // Control plane operation routed through the proxy server
        System.out.println(pinecone.describeIndex("PINECONE_INDEX"));

Note: Users need to set up certificate authorities (CAs) to establish secure connections. Certificates verify server identities and encrypt data exchanged between the SDK and servers. By focusing on proxy host and port details, the SDK simplifies network setup while ensuring security.

Type of Change

Test Plan

Added unit tests. Given that we had issues with adding integration tests with mitm proxy in python SDK, I'm going to skip adding mitm proxy to github CI and have instead tested locally by spinning mitm proxy and successfully ran both control and data plane operations.

rohanshah18 commented 4 months ago

Some considerations that went into the design:

  1. If the user wants to use separate proxies for control and data plane, then they should be able to and that's why I added withDataPlaneProxy() and withControlPlaneProxy().
  2. ProxyConfig is a separate class so all properties related to proxy configurations are in one place and is not cluttering the PineconeConfig class.