API support? - Githubissues

sumukshashidhar commented 3 months ago

Any plans to support the OpenAI API standard to do exactly what you're doing now?

It'll help with batching large files that can't be processed locally / using a more capable model to perform the same task

ozgrozer commented 3 months ago

I actually thought about it a lot but didn't make sense to me. Because this is just passing the content of the files to the LLM. So if you have lots of large files then it could just drain your OpenAI credits immediately.

henk717 commented 3 months ago

OpenAI API with custom URL support gets you support for all the other prominent backends locally. Ollama deviates from the standard so only supporting its API is not a good idea.

ozgrozer commented 3 months ago

I see. You have a point.

sumukshashidhar commented 3 months ago

Yes! This is also what I intended. For large batching tasks, I often host a vLLM server myself either locally, or on an instance I own..

ozgrozer commented 3 months ago

Let's make this issue open. If more people want this feature then I'll implement it. Right now only you've asked for the OpenAI APIs.

staberas commented 3 months ago

currently using it to rename a bunch of unsorted pics , after i installed it tried to make it work with my LM Studio local server, but apparently it doesnt work? so adding custom url/port would be nice

ozgrozer commented 3 months ago

OK so I'll take a look at OpenAI APIs to see if it's compatible with the current structure.

ozgrozer commented 3 months ago

Turns out Ollama supports OpenAI APIs but it doesn't work with images so I just used Axios to make calls to Ollama's local APIs. Since we're not depending on ollama package I was able to implement LM Studio as well. Added some details to readme to how to use LM Studio.

henk717 commented 3 months ago

If LM Studio is working Koboldcpp should work as well, but Koboldcpp does have image support in its API.

ozgrozer commented 3 months ago

Never heard of KoboldAI but now there are --provider and --base-url flags. If they're using the standard /chat/completions endpoint it should be working with npx ai-renamer /path --provider=lm-studio --base-url=http://localhost:port (Don't mind the lm-studio provider here, it will work if the base url is working).

henk717 commented 3 months ago

May be better to name it a generic term such as openai to indicate support beyond lmstudio.

ozgrozer commented 3 months ago

OK so right now I've added Ollama, LM Studio and OpenAI as providers so it could be used for lots of providers.

staberas commented 3 months ago

Bug: using LM-studio and llava-llama3 When used on a meme image , ai writes a whole paragraph of what the picture is and what is captured is in the end of the sentence
example : 🟢 Renamed: ion-and-its-impact-on-society.jpg to teman-s-opinion-on-this-topic.jpg

  "model": "llava-llama3",
  "stream": false,
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Generate a concise, descriptive filename for the following content:\n- Use kebabCase format\n- Maximum 20 characters\n- Use English language in the filename\n- Exclude file extension\n- Avoid special characters\n- Output only the filename\n\nIMPORTANT: Your entire response should be just the filename in kebabCase format, in English language, and max 20 characters. Do not include any other text."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/4QBoRXhpZgAATU0AKgAAAAgABAEaAAUAAAABAAAAPgEbAAUAA... (truncated in these logs)"
          }
        }
      ]
    }
  ]
}

  "id": "chatcmpl-84yft6v5wh3490fae619l9",
  "object": "chat.completion",
  "created": 1720768119,
  "model": "xtuner/llava-llama-3-8b-v1_1-gguf/llava-llama-3-8b-v1_1-int4.gguf",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": " image contains a photo of actor Jason Bateman, who is known for his roles in films such as \"Arrested Development\" and \"Jumanji\". In this image, he is standing in an office setting, dressed professionally in a suit and tie. The image also includes a caption that discusses the cost of diversity and equity specialists, stating that a full-time specialist would cost $80K per year. It asks how much does it take for nothing to be racist. The caption implies that the cost of implementing measures to promote diversity and equity could be high. However, the image itself does not provide any information about Jason Bateman's opinion on this topic."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 102,
    "completion_tokens": 133,
    "total_tokens": 235
  }
}

ozgrozer commented 3 months ago

Some models don't follow the rules I specifed in the prompt. I use llava from Ollama and no issues. It's probably the model xtuner/llava-llama-3-8b-v1_1-gguf you're using. This is the prompt I use for all the models. You can get the prompt and use a different version of it in LM Studio manually and if you get a good result with a custom prompt then I'll try it out and possibly change the prompt in the code.

ozgrozer commented 3 months ago

I've simplified the prompt but still lower level models may not follow the rules.

staberas commented 3 months ago

will test

staberas commented 3 months ago

ok, with the exception of 1 image that was short and not descriptive the rest where a paragraph long which only the last part of the text was kept. So my solution is to do a 2 step prompt . first we ask a full description of the image , then refeed that and ask for the shorten filename :

ozgrozer commented 3 months ago

That's fine but we need to get the filename with just one prompt.

staberas commented 3 months ago

i believe you need to do a double pass, for more accurate title result let it explain the image and then refeed the description+prompt to get the summarised title. make it it own flag ie --double-pass

ozgrozer commented 3 months ago

I think the model you're using hallucinates a lot. Maybe a different model you may try.

ozgrozer / ai-renamer

API support? #3