spring-projects / spring-ai

An Application Framework for AI Engineering
https://docs.spring.io/spring-ai/reference/index.html
Apache License 2.0
3.29k stars 841 forks source link

Add Support for get usage tokens #814

Closed tluo-github closed 1 month ago

tluo-github commented 5 months ago
lltx commented 5 months ago

Yes, the next chunk after the stop chunk contain usage, but now it is always set null.

YanjiuHe commented 5 months ago

I've identified a solution for extracting usage statistics from the chat response, particularly when streaming is involved. In the Spring-AI framework, the usage-related information is encapsulated within the chatResponseMetadata attribute, instantiated from the OpenAiChatResponseMetadata class. This class includes a Usage object among other metadata elements.

The issue arises because, despite the presence of the necessary data structure, the serialization process in Spring does not expose the Usage and related sub-properties in the returned data, as the class inherits from HashMap. To rectify this, I propose modifying the initialization process of OpenAiChatResponseMetadata to explicitly insert the id, usage, and rateLimit into the map during construction. Here's the revised class definition:

public class OpenAiChatResponseMetadata extends HashMap<String, Object> implements ChatResponseMetadata {

    protected static final String AI_METADATA_STRING = "{ @type: %1$s, id: %2$s, usage: %3$s, rateLimit: %4$s }";

    public static OpenAiChatResponseMetadata from(OpenAiApi.ChatCompletion result) {
        Assert.notNull(result, "OpenAI ChatCompletionResult must not be null");
        OpenAiUsage usage = OpenAiUsage.from(result.usage());
        OpenAiChatResponseMetadata chatResponseMetadata = new OpenAiChatResponseMetadata(result.id(), usage);
        return chatResponseMetadata;
    }

    // Removed the private final fields to directly manage via the map

    protected OpenAiChatResponseMetadata(String id, OpenAiUsage usage) {
        this(id, usage, null);
    }

    protected OpenAiChatResponseMetadata(String id, OpenAiUsage usage, @Nullable OpenAiRateLimit rateLimit) {
        this.put("id", id);
        this.put("usage", usage);
        this.put("rateLimit", rateLimit);
    }

    public String getId() {
        return (String) this.get("id");
    }

    @Override
    @Nullable
    public RateLimit getRateLimit() {
        RateLimit rateLimit = (RateLimit) this.get("rateLimit");
        return rateLimit != null ? rateLimit : new EmptyRateLimit();
    }

    @Override
    public Usage getUsage() {
        Usage usage = (OpenAiUsage) this.get("usage");
        return usage != null ? usage : new EmptyUsage();
    }

    public OpenAiChatResponseMetadata withRateLimit(RateLimit rateLimit) {
        this.put("rateLimit", rateLimit);
        return this;
    }

    @Override
    public String toString() {
        return AI_METADATA_STRING.formatted(getClass().getName(), getId(), getUsage(), getRateLimit());
    }
}

After implementing the mentioned modifications, I conducted a test using Postman to validate the functionality. I am pleased to report that the usage statistics are now successfully included in the response, as evidenced by the following sample output:

{
    "result": {
        "metadata": {
            "contentFilterMetadata": null,
            "finishReason": "STOP"
        },
        "output": {
            "messageType": "ASSISTANT",
            "media": [],
            "metadata": {
                "finishReason": "STOP",
                "role": "ASSISTANT",
                "id": "chatcmpl-9WfUtvFJ8oAZpjDSdK6tEEBbsRAcB",
                "messageType": "ASSISTANT"
            },
            "content": "1. Forrest Gump (1994)\n2. Cast Away (2000)\n3. Saving Private Ryan (1998)\n4. Philadelphia (1993)\n5. The Green Mile (1999)\n6. Apollo 13 (1995)\n7. Captain Phillips (2013)\n8. A League of Their Own (1992)\n9. Sully (2016)\n10. Big (1988)"
        }
    },
    "metadata": {
        "rateLimit": null,
        "usage": {
            "promptTokens": 13,
            "generationTokens": 85,
            "totalTokens": 98
        },
        "id": "chatcmpl-9WfUtvFJ8oAZpjDSdK6tEEBbsRAcB"
    },
    "results": [
        {
            "metadata": {
                "contentFilterMetadata": null,
                "finishReason": "STOP"
            },
            "output": {
                "messageType": "ASSISTANT",
                "media": [],
                "metadata": {
                    "finishReason": "STOP",
                    "role": "ASSISTANT",
                    "id": "chatcmpl-9WfUtvFJ8oAZpjDSdK6tEEBbsRAcB",
                    "messageType": "ASSISTANT"
                },
                "content": "1. Forrest Gump (1994)\n2. Cast Away (2000)\n3. Saving Private Ryan (1998)\n4. Philadelphia (1993)\n5. The Green Mile (1999)\n6. Apollo 13 (1995)\n7. Captain Phillips (2013)\n8. A League of Their Own (1992)\n9. Sully (2016)\n10. Big (1988)"
            }
        }
    ]
}

@tzolov Please review the changes outlined above. If the approach seems reasonable, I kindly request that you assign this task to me so I can implement the fix and contribute it back to the project. This will be my first contribution to an opensource project, I'm so excited about it. 😊

markpollack commented 4 months ago

I believe the way we need to fix this is to not inherit from hashmap. thoughts?

ThomasVitale commented 4 months ago

I like the idea of not inheriting from HashMap. The Usage information is already available through the ChatResponse interface. If I understood the problem, the issue is not about getting the Usage information directly from the response (which is already possible today), but it is a problem when serialising the object (since the Usage info is extracted on the fly). Not inheriting from HashMap and having a more explicit POJO implementation of the interface should fix the serialisation problem. And for additional data, a map could be added explicitly as a field.

markpollack commented 3 months ago

I very much would like to get this done for M2, it has been on my plate for a while and hope to get to it this week.

markpollack commented 1 month ago

ok, there are two conflated issues in this single issue.

The first is about the serialization of the Usage object and the second is about getting the usage tokens while streaming.

We have tested and do observe the Usage when streaming, use the option to enable usage while streaming as you can see in this partial cut-n-paste of the test

    @Test
    void beanStreamOutputConverterRecords() {

        BeanOutputConverter<ActorsFilms> outputConverter = new BeanOutputConverter<>(ActorsFilms.class);

        // @formatter:off
        Flux<ChatResponse> chatResponse = ChatClient.create(chatModel)
                .prompt()
                .options(OpenAiChatOptions.builder().withStreamUsage(true).build())
                .advisors(new SimpleLoggerAdvisor())
                .user(u -> u
                        .text("Generate the filmography of 5 movies for Tom Hanks. " + System.lineSeparator()
                                + "{format}")
                        .param("format", outputConverter.getFormat()))
                .stream()
                .chatResponse();

        List<ChatResponse> chatResponses = chatResponse.collectList()
                .block()
                .stream()
                .toList();

The first issue for serializing the OpenAiUsage class is indeed an issue. Right now we have

public class OpenAiUsage implements Usage {

    public static OpenAiUsage from(OpenAiApi.Usage usage) {
        return new OpenAiUsage(usage);
    }

    private final OpenAiApi.Usage usage;

there is no need to have OpenAiApi.Usage as a private data member. We should change this to use DefaultUsage This probably needs to occur in other model providers as well.

I will close this issue then and create a new one regarding serialize of Usage implementation. There is generally not a very simply way to serialize ChatResponse, it should really be considered an internal domain object, not an on the wire object but we will aim to the best we can. I once had to sprinkle lots of jackson annotations and rewrite some custom jackson classes to make ChatResponse deserialize and that felt like the wrong path.

Anyway, I hope this answers the issue. Thanks for reporting.