yigitkonur / swift-ocr-llm-powered-pdf-to-markdown

An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing and batching to deliver high-quality text extraction from complex PDF documents. Ideal for businesses seeking efficient document digitization and data extraction solutions.
Other
695 stars 49 forks source link

Can you add ollama support? #6

Open shasankp000 opened 1 month ago

johnblommers commented 1 month ago

Being able to choose one's LLM and implement a local-first solution is highly desirable.

That said, I'm unable to use this tool because I don't have any keys to use AZURE-whatever. Why is this even needed, why can't we just use our OPENAI_API_KEY alone?

yigitkonur commented 1 month ago

You can use this directly with a standard OpenAI key - if you don't input an Azure endpoint, I default to the OpenAI base URL. You could even adapt this for Anthropic with minimal tweaks by running the code through ChatGPT for some minor adjustments. It's not too complicated.

As far as I know, Ollama provides multi-modal support through LLaVa, but it might not be the most performant or consistent option. That's why I haven't added local LLM support yet, but I'll look into it when I have some downtime. I might add it in the future.

slucha commented 1 month ago

How do I use it without the Azure endpoint? If I remove the environment variables or leave them empty, I get: File "", line 488, in _call_with_frames_removed File "/workspace/swift-ocr-llm-powered-pdf-to-markdown/main.py", line 52, in Settings.validate() File "/workspace/swift-ocr-llm-powered-pdf-to-markdown/main.py", line 47, in validate raise ValueError( ValueError: Missing required environment variables: AZURE_OPENAI_ENDPOINT, OPENAI_DEPLOYMENT_ID

or

2024-09-27 09:56:03,922 - main - INFO - Deleted temporary PDF file /tmp/tmpilvez95t.pdf. 2024-09-27 09:56:03,922 - main - ERROR - HTTPException: OCR processing failed: Connection error. INFO: 127.0.0.1:48952 - "POST /ocr HTTP/1.1" 502 Bad Gateway

shasankp000 commented 1 month ago

I think I might fork the project and add ollama support myself.

Ualas-kinder commented 1 month ago

@slucha Have you figured out how to fix the OCR processing failed: Connection error issue?