googleapis / google-cloud-java

Google Cloud Client Library for Java
https://cloud.google.com/java/docs/reference
Apache License 2.0
1.89k stars 1.06k forks source link

Vertex AI SDK doesn't support multi turn conversation for multimodal. #10164

Closed SetoKaiba closed 9 months ago

SetoKaiba commented 9 months ago

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please include as much information as possible:

Environment details

  1. Specify the API at the beginning of the title. For example, "[vision]: ..."). General, Core, and Other are also allowed as types
  2. OS type and version: windows 10
  3. Java version: GraalVM 21.0.1
  4. Version(s): vertex 0.1.0

Steps to reproduce

  1. Send a multimodal message with ChatSession
  2. A response will be return
  3. Send a message again even text only
  4. An internal error occurs

Code example

import com.google.cloud.vertexai.VertexAI;
import com.google.cloud.vertexai.api.GenerateContentResponse;
import com.google.cloud.vertexai.generativeai.preview.*;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class ChatDiscussion {

    public static void main(String[] args) throws IOException {
        // TODO(developer): Replace these variables before running the sample.
        String projectId = "seto-goagent0";
        String location = "us-central1";
        String modelName = "gemini-pro-vision";

        chatDiscussion(projectId, location, modelName);
    }

    // Ask interrelated questions in a row using a ChatSession object.
    public static void chatDiscussion(String projectId, String location, String modelName)
            throws IOException {
        // Initialize client that will be used to send requests. This client only needs
        // to be created once, and can be reused for multiple requests.
        try (VertexAI vertexAI = new VertexAI(projectId, location)) {
            GenerateContentResponse response;

            GenerativeModel model = new GenerativeModel(modelName, vertexAI);
            // Create a chat session to be used for interactive conversation.
            ChatSession chatSession = new ChatSession(model);

            System.out.println("response:");
            response = chatSession.sendMessage(ContentMaker.fromMultiModalData("Hello. Describe The image please.",
                    PartMaker.fromMimeTypeAndData("image/png", readImageFile(
                            "https://storage.googleapis.com/cloud-samples-data/vertex-ai/llm/prompts/landmark1.png"))));
            System.out.println(ResponseHandler.getText(response));

            System.out.println("response:");
            response = chatSession.sendMessage("Tell me more");
            System.out.println(ResponseHandler.getText(response));
            System.out.println("Chat Ended.");
        }
    }

    // Reads the image data from the given URL.
    public static byte[] readImageFile(String url) throws IOException {
        URL urlObj = new URL(url);
        HttpURLConnection connection = (HttpURLConnection) urlObj.openConnection();
        connection.setRequestMethod("GET");

        int responseCode = connection.getResponseCode();

        if (responseCode == HttpURLConnection.HTTP_OK) {
            InputStream inputStream = connection.getInputStream();
            ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

            byte[] buffer = new byte[1024];
            int bytesRead;
            while ((bytesRead = inputStream.read(buffer)) != -1) {
                outputStream.write(buffer, 0, bytesRead);
            }

            return outputStream.toByteArray();
        } else {
            throw new RuntimeException("Error fetching file: " + responseCode);
        }
    }
}

Stack trace

response:
 The image is of the Colosseum in Rome, Italy. The Colosseum is an oval amphitheater in the center of the city of Rome, Italy. Built of concrete and stone, it is the largest ancient amphitheater ever built and is still the largest standing theater in the world today. The Colosseum could hold, it is estimated, between 50,000 and 80,000 spectators, having an average audience of some 65,000; it was used for gladiatorial contests and public spectacles such as mock sea battles, animal hunts, executions, re-enactments of famous battles, and dramas based on Classical mythology. The building ceased to be used for entertainment in the early medieval era. It was later reused for housing, workshops, quarters for a religious order, a fortress, a quarry, and a Christian shrine.

Although in the 21st century, it remains partially ruined because of damage caused by earthquakes and stone robbers, the Colosseum is an iconic symbol of Imperial Rome. It is one of Rome's most popular tourist attractions and also has links to the Roman Catholic Church, as each Good Friday, the Pope leads a torchlit "Way of the Cross" procession that starts in the area around the Colosseum.
response:
Exception in thread "main" com.google.api.gax.rpc.InternalException: io.grpc.StatusRuntimeException: INTERNAL: Internal error encountered.
    at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:110)
    at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:41)
    at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:86)
    at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:66)
    at com.google.api.gax.grpc.ExceptionResponseObserver.onErrorImpl(ExceptionResponseObserver.java:82)
    at com.google.api.gax.rpc.StateCheckingResponseObserver.onError(StateCheckingResponseObserver.java:84)
    at com.google.api.gax.grpc.GrpcDirectStreamController$ResponseObserverAdapter.onClose(GrpcDirectStreamController.java:148)
    at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:570)
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:574)
    at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:72)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:742)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:723)
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
    Suppressed: java.lang.RuntimeException: Asynchronous task failed
        at com.google.api.gax.rpc.ServerStreamIterator.hasNext(ServerStreamIterator.java:105)
        at com.google.cloud.vertexai.generativeai.preview.ResponseStreamIteratorWithHistory.hasNext(ResponseStreamIteratorWithHistory.java:37)
        at com.google.cloud.vertexai.generativeai.preview.ResponseHandler.aggregateStreamIntoResponse(ResponseHandler.java:104)
        at com.google.cloud.vertexai.generativeai.preview.GenerativeModel.generateContent(GenerativeModel.java:396)
        at com.google.cloud.vertexai.generativeai.preview.ChatSession.sendMessage(ChatSession.java:224)
        at com.google.cloud.vertexai.generativeai.preview.ChatSession.sendMessage(ChatSession.java:182)
        at ChatDiscussion.chatDiscussion(ChatDiscussion.java:41)
        at ChatDiscussion.main(ChatDiscussion.java:19)
Caused by: io.grpc.StatusRuntimeException: INTERNAL: Internal error encountered.
    at io.grpc.Status.asRuntimeException(Status.java:537)
    ... 14 more

External references such as API reference guides

Any additional information below

Following these steps guarantees the quickest resolution possible.

Thanks!

meltsufin commented 9 months ago

@ZhenyiQ PTAL

ZhenyiQ commented 9 months ago

Thanks @SetoKaiba for the feedback!

None of the models released now (https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#gemini-models) supports multi-modal multi-turn chat. The SDK will support this as soon as the released model does.