Speed up processing of long files

neo4j-labs / llm-graph-builder

Neo4j graph construction from unstructured data using LLMs

https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/

Apache License 2.0

2.16k stars 331 forks source link

Speed up processing of long files #488

Open maxgosk opened 3 months ago

maxgosk commented 3 months ago

Hi, I've been ingesting pdf files of 400+ pages, but it takes very long or sometimes it just gets stuck (I have increased combined chunk-size to 20) but is still not enough.

Is there a way to make that process asyncronous or we need to wait for the chunks to be processed into entities before starting the new ones? Is it related to neo4j limitations in writes per second?

I'm using azure gpt-4o and currently have a limit of 900k TPM.

If anyone can provide advise I could try implementing myself, thanks!

jexp commented 2 months ago

If you look at the code you see that we’re already processing with multiple threads per file But as a public app we need to balance multi-user, multi-file with the backend capacity.

I suggest that you take a notebook and run the llm graph transformer code concurrently over your files in azure directly

https://python.langchain.com/v0.1/docs/use_cases/graph/constructing/

maxgosk commented 2 months ago

Hi Jexp!, thanks for your comment. I been OCRing each file separately and providing only the pdf with the text on it (reducing file size considerably), as well implemented google cloud run to automatically scale when more requests so that is working fine too so far.

Last night I ingested a dataset of files (around 10 thousand pages) using AzureOpenAI text-embedding-3-large, the embedding works fine and is saved into Neo4j, however when asking a question, it fails as the embedding model reaches the quota limit (currently 350K TPM). Not sure if this is a bug but even when using CHAT_SEARCH_KWARG_K=1 it fails, why the platform needs to calculate so many embeddings when they are already processed?

Thanks a lot and looking forward to your answer!

jexp commented 2 months ago

Can you clarify your last question.

The chat should only use the vector index that's already in neo4j (indexing the chunks), the only embedding that is generated during question answering is the embedding for your question.

In DEV we also added Entity Embedding generation, but you need to enable that specifically to be executed.

maxgosk commented 2 months ago

Yes exactly, I know that should be like that, but for some reason I get Token limit error, will keep checking why