neo4j-labs / llm-graph-builder

Neo4j graph construction from unstructured data using LLMs
https://neo4j.com/labs/genai-ecosystem/llm-graph-builder/
Apache License 2.0
2.05k stars 304 forks source link

Want to keep my PDF data Private #600

Open Naveen-Chaurasia opened 1 month ago

Naveen-Chaurasia commented 1 month ago

I am using the LLM Graph Builder application and have concerns regarding the privacy and security of the data I upload. Specifically, I need to ensure that the uploaded data (e.g., PDFs) remains private and is not accessible by unauthorized parties, including OpenAI. I would like to know the following:

Privacy of Uploaded Data:

  1. How is the uploaded data stored within the application?
  2. Are there any measures in place to ensure that the data remains private and secure?
  3. Will OpenAI have access to the data that I upload to the LLM Graph Builder application?
Kain-90 commented 1 month ago

At the moment, it seems to me that the process of handling data looks like this

  1. Upload: the full uploaded files are temporarily stored in the backend/merged_files directory, and then stored in the neo4j database.
  2. embedding session: the documents are divided into chunks, which are transformed into high level vectors by the embedding model and also stored in the neo4j database.
  3. entity extraction session: each chunk is again extracted by the big model to extract entities and relationships from it, and finally stored to the neo4j database.

Therefore, if you want to ensure complete data privacy, you need to make sure that the data processor is local to you or a third party you trust. At least that's what I'm doing at the moment.

  1. Local deployment of the Neo4j database
  2. Local deployment of embedded models
  3. Local ollama deployment of entity-relationship extraction llm

If there is something missing, please feel free to add it.