zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
https://docs.privategpt.dev
Apache License 2.0
52.98k stars 7.12k forks source link

Parsing PDF takes forever #1807

Open Bardo-Konrad opened 3 months ago

Bardo-Konrad commented 3 months ago

I get endless output like this

Parsing nodes: 100%|██████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s]
Generating embeddings: 0it [00:00, ?it/s]
Parsing nodes: 100%|██████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s]
Generating embeddings: 100%|██████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.88s/it]
Generating embeddings: 0it [00:00, ?it/s]

For a simple 497KB pdf it's already 186s / 30.9s and counting.

What is wrong?

ItsCRC commented 3 months ago

I have faced the same issue for plain txt files. I am using local settings as opposed to Ollama. No solution as of yet.