openai / openai-cookbook

Examples and guides for using the OpenAI API
https://cookbook.openai.com
MIT License
59.94k stars 9.56k forks source link

openAI API not supporting Image processing #1562

Open AhzamHassan opened 4 days ago

AhzamHassan commented 4 days ago

I am trying to upload an image to get response from the openAI using its API but the response says, i am unable to process images. my code:

const completion = await openai.chat.completions.create({
      model: "gpt-4o",
messages: [
        {
          role: "system",
          content: "You are a helpful assistant, i am sharing an image with you please give me the solution for this math problem.",
        },
        {
          role: "user",
          content: JSON.stringify({
            type: "image_url",
            image_url: "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          }),
        },
      ],
    });

Response (Postman):

{
    "data": {
        "role": "assistant",
        "content": "I'm sorry, but as a text-based AI, I'm unable to view or interpret images. However, if you describe the math problem to me or type it out, I'd be more than happy to assist you in solving it.",
        "refusal": null
    },
    "message": "Success",
    "success": true
}
erenakbay commented 1 day ago

The GPT-4 API doesn’t support image processing directly, as it handles only text inputs. To resolve this, use an OCR tool like Tesseract to extract text from the image, then pass the extracted text to the GPT-4 API for analysis or problem-solving.