tmc / langchaingo

LangChain for Go, the easiest way to write LLM-based programs in Go
https://tmc.github.io/langchaingo/
MIT License
4.64k stars 624 forks source link

Unable to send BinaryPart to OpenAI Completions API #958

Open pilsnerbeer opened 4 months ago

pilsnerbeer commented 4 months ago

Trying to directly send a image/png to OpenAI completions API. (Model: Gpt-4o / Mini)

Snippet:

        if imageData != nil {
            historyWithBinary = append(historyWithBinary, llms.MessageContent{
                Role: llms.ChatMessageTypeHuman,
                Parts: []llms.ContentPart{
                    llms.BinaryPart("image/png", imageData),
                },
            })
        }

        choices, err := client.GenerateContent(context.Background(), historyWithBinary, llms.WithTools([]llms.Tool{FileWriteTool}))

Returns Error:

Error generating content: API returned unexpected status code: 400: Invalid value:
'binary'. Supported values are: 'text', 'image_ url', and 'audio_url'.

This seems to be only problematic with OpenAI. Gemini and Ollama worked fine when i tested it with the same snippet. According to https://platform.openai.com/docs/guides/vision doc sending images directly to vision API shoudl be possible so the error is not clear to me

ccrlawrence commented 2 days ago

I have this same issue, I can workaround with:

base64Image := base64.StdEncoding.EncodeToString(jpegBytes)

then llms.ImageURLPart(fmt.Sprintf("data:image/jpeg;base64,%s", base64Image))

Not sure if BinaryPart should just be supported anyway? Ideally it would somehow?