spring-projects / spring-ai

An Application Framework for AI Engineering
https://docs.spring.io/spring-ai/reference/1.0-SNAPSHOT/index.html
Apache License 2.0
2.8k stars 699 forks source link

Add Support for get usage tokens #814

Open tluo-github opened 2 months ago

tluo-github commented 2 months ago
lltx commented 2 months ago

Yes, the next chunk after the stop chunk contain usage, but now it is always set null.

YanjiuHe commented 2 months ago

I've identified a solution for extracting usage statistics from the chat response, particularly when streaming is involved. In the Spring-AI framework, the usage-related information is encapsulated within the chatResponseMetadata attribute, instantiated from the OpenAiChatResponseMetadata class. This class includes a Usage object among other metadata elements.

The issue arises because, despite the presence of the necessary data structure, the serialization process in Spring does not expose the Usage and related sub-properties in the returned data, as the class inherits from HashMap. To rectify this, I propose modifying the initialization process of OpenAiChatResponseMetadata to explicitly insert the id, usage, and rateLimit into the map during construction. Here's the revised class definition:

public class OpenAiChatResponseMetadata extends HashMap<String, Object> implements ChatResponseMetadata {

    protected static final String AI_METADATA_STRING = "{ @type: %1$s, id: %2$s, usage: %3$s, rateLimit: %4$s }";

    public static OpenAiChatResponseMetadata from(OpenAiApi.ChatCompletion result) {
        Assert.notNull(result, "OpenAI ChatCompletionResult must not be null");
        OpenAiUsage usage = OpenAiUsage.from(result.usage());
        OpenAiChatResponseMetadata chatResponseMetadata = new OpenAiChatResponseMetadata(result.id(), usage);
        return chatResponseMetadata;
    }

    // Removed the private final fields to directly manage via the map

    protected OpenAiChatResponseMetadata(String id, OpenAiUsage usage) {
        this(id, usage, null);
    }

    protected OpenAiChatResponseMetadata(String id, OpenAiUsage usage, @Nullable OpenAiRateLimit rateLimit) {
        this.put("id", id);
        this.put("usage", usage);
        this.put("rateLimit", rateLimit);
    }

    public String getId() {
        return (String) this.get("id");
    }

    @Override
    @Nullable
    public RateLimit getRateLimit() {
        RateLimit rateLimit = (RateLimit) this.get("rateLimit");
        return rateLimit != null ? rateLimit : new EmptyRateLimit();
    }

    @Override
    public Usage getUsage() {
        Usage usage = (OpenAiUsage) this.get("usage");
        return usage != null ? usage : new EmptyUsage();
    }

    public OpenAiChatResponseMetadata withRateLimit(RateLimit rateLimit) {
        this.put("rateLimit", rateLimit);
        return this;
    }

    @Override
    public String toString() {
        return AI_METADATA_STRING.formatted(getClass().getName(), getId(), getUsage(), getRateLimit());
    }
}

After implementing the mentioned modifications, I conducted a test using Postman to validate the functionality. I am pleased to report that the usage statistics are now successfully included in the response, as evidenced by the following sample output:

{
    "result": {
        "metadata": {
            "contentFilterMetadata": null,
            "finishReason": "STOP"
        },
        "output": {
            "messageType": "ASSISTANT",
            "media": [],
            "metadata": {
                "finishReason": "STOP",
                "role": "ASSISTANT",
                "id": "chatcmpl-9WfUtvFJ8oAZpjDSdK6tEEBbsRAcB",
                "messageType": "ASSISTANT"
            },
            "content": "1. Forrest Gump (1994)\n2. Cast Away (2000)\n3. Saving Private Ryan (1998)\n4. Philadelphia (1993)\n5. The Green Mile (1999)\n6. Apollo 13 (1995)\n7. Captain Phillips (2013)\n8. A League of Their Own (1992)\n9. Sully (2016)\n10. Big (1988)"
        }
    },
    "metadata": {
        "rateLimit": null,
        "usage": {
            "promptTokens": 13,
            "generationTokens": 85,
            "totalTokens": 98
        },
        "id": "chatcmpl-9WfUtvFJ8oAZpjDSdK6tEEBbsRAcB"
    },
    "results": [
        {
            "metadata": {
                "contentFilterMetadata": null,
                "finishReason": "STOP"
            },
            "output": {
                "messageType": "ASSISTANT",
                "media": [],
                "metadata": {
                    "finishReason": "STOP",
                    "role": "ASSISTANT",
                    "id": "chatcmpl-9WfUtvFJ8oAZpjDSdK6tEEBbsRAcB",
                    "messageType": "ASSISTANT"
                },
                "content": "1. Forrest Gump (1994)\n2. Cast Away (2000)\n3. Saving Private Ryan (1998)\n4. Philadelphia (1993)\n5. The Green Mile (1999)\n6. Apollo 13 (1995)\n7. Captain Phillips (2013)\n8. A League of Their Own (1992)\n9. Sully (2016)\n10. Big (1988)"
            }
        }
    ]
}

@tzolov Please review the changes outlined above. If the approach seems reasonable, I kindly request that you assign this task to me so I can implement the fix and contribute it back to the project. This will be my first contribution to an opensource project, I'm so excited about it. 😊

markpollack commented 1 month ago

I believe the way we need to fix this is to not inherit from hashmap. thoughts?

ThomasVitale commented 1 month ago

I like the idea of not inheriting from HashMap. The Usage information is already available through the ChatResponse interface. If I understood the problem, the issue is not about getting the Usage information directly from the response (which is already possible today), but it is a problem when serialising the object (since the Usage info is extracted on the fly). Not inheriting from HashMap and having a more explicit POJO implementation of the interface should fix the serialisation problem. And for additional data, a map could be added explicitly as a field.

markpollack commented 1 month ago

I very much would like to get this done for M2, it has been on my plate for a while and hope to get to it this week.