junhoyeo / BetterOCR

🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.
MIT License
489 stars 27 forks source link

Can I use this repo without internet? #23

Open Minseong-COLI opened 11 months ago

Minseong-COLI commented 11 months ago

This repo use LLM model by calling API But, I wanna use it like a import a library, such as easyocr, pororo etc.. Can I use BetterOCR and LLM model like that?

junhoyeo commented 11 months ago

@Minseong-COLI Sorry for the late reply, I'll try to modify it to support LLMs from the open source community like llama.cpp soon. In the meantime, if you have any LLM models that you mainly install and use, please share them here.

PeterHagen commented 2 months ago

Why not support Ollama? Ollama has a drop-in replacement API for ChatGPT. The only thing that has to be added is support for base_api in the openai settings:

openai={
  # OpenAI options here

  # `os.environ["OPENAI_API_KEY"]` is used by default
  "API_KEY": "ollama",
  "model": "llama3.1",
  "API_BASE": "http://localhost:11434/v1"
}

See here for some more information.

I tried the prompt manually with gemma2 and llama3.1, and they work perfectly. The boxes detection prompt doesn't seem to work out of the box at the moment.

I would suggest something like the following to be added in detect.py:

# Prioritize user-specified API_KEY and API_BASE
api_key = options["openai"].get("API_KEY", os.environ.get("OPENAI_API_KEY"))
api_base = options["openai"].get("API_BASE", os.environ.get("OPENAI_API_BASE"))

# Make a shallow copy of the openai options and remove the API_KEY
openai_options = options["openai"].copy()
if "API_KEY" in openai_options:
    del openai_options["API_KEY"]

if "API_BASE" in openai_options:
    del openai_options["API_BASE"]

client = OpenAI(
    api_key=api_key,
    api_base=api_base
)

print("=====")