gptscript-ai / gpt4-v-vision

9 stars 4 forks source link

gpt4-v-vision

gpt4-v-vision is a simple OpenAI CLI and GPTScript Tool for interacting with vision models.

Prerequisites

Usage

Import vision into any .gpt script by referencing this GitHub repo.

Tools: github.com/gptscript-ai/gpt4-v-vision

Describe the images at the following locations:
- examples/eiffel-tower.png
- https://avatars.githubusercontent.com/u/158112119?s=400&u=d2c6ae055a80ced8209f4aab2562986a97d79e9f&v=4

You will be prompted to enter your OpenAI API key if you have not provided it before.

Testing Changes

  1. Clone this repository or download the source code:

    git clone git@github.com:gptscript-ai/gpt4-v-vision.git
    cd gpt4-v-vision
  2. Install the npm dependencies

    npm install 
  3. Import the local tools.gpt file to test local changes

    Here's a simple example:

    # The tool script import path is relative to the directory of the script importing it; in this case ./examples
    Tools: ../tool.gpt
    Description: This script is used to test local changes to the vision tool by invoking it with a simple prompt and image references.
    
    Describe the images at the following locations:
    - examples/eiffel-tower.png
    - https://avatars.githubusercontent.com/u/158112119?s=400&u=d2c6ae055a80ced8209f4aab2562986a97d79e9f&v=4

    It can be run from the root directory of this repo

    # Disable response caching to ensure the tool is always called for testing purposes
    gptscript --disable-cache examples/test.gpt

Running the CLI

$ node index.js --help
Usage: index [options] <prompt> <images...>

Utility for processing images with the OpenAI API

Arguments:
  prompt                      Prompt to send to the vision model
  images                      List of image URIs to process. Supports file:// and https:// protocols. Images must be jpeg or png.

Options:
  --openai-api-key <key>      OpenAI API Key (env: OPENAI_API_KEY)
  --openai-base-url <string>  OpenAI base URL (env: OPENAI_BASE_URL)
  --openai-org-id <string>    OpenAI Org ID to use (env: OPENAI_ORG_ID)
  --max-tokens <number>       Max tokens to use (default: 2048, env: MAX_TOKENS)
  --model <model>             Model to process images with (choices: "gpt-4o", "gpt-4-turbo", default: "gpt-4o", env: MODEL)
  --detail <detail>           Fidelity to use when processing images (choices: "low", "high", "auto", default: "auto", env: DETAIL)
  -h, --help                  display help for command

Ask a question about an image in a local file

node index.js 'Describe the picture' 'file://examples/eiffel-tower.png'

Ask a question about an image at a remote URL

node index.js 'Describe the picture' 'https://github.com/gptscript-ai/vision/blob/main/examples/eiffel-tower.png?raw=true'

Ask a question related to multiple images

node index.js 'Do you think these two portraits are by the same artist?' 'https://github.com/gptscript-ai/vision/blob/main/examples/eiffel-tower.png?raw=true' 'file://examples/eiffel-tower.png'