run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.47k stars 4.69k forks source link

[Question]: Is data shared when creating embeddings or querying? #14027

Open alejopaullier96 opened 1 month ago

alejopaullier96 commented 1 month ago

Question Validation

Question

I am concerned about data privacy. For example, if we are using paid LLMs via an API like OpenAI's, how do we know if OpenAI or other companies aren't storing our data sources and embeddings for further improving their models?

The ideal scenario I think would be to interact with a proprietary embedding model and a proprietary LLM model in an offline mode (no internet) to ensure no data is shared when privacy is a concern.

Is this possible using LlamaIndex?

logan-markewich commented 1 month ago

We have an extreme number of LLM and embedding model integrations

Here's a quickstart for my favorite pairs for local dev -- ollama and huggingface https://docs.llamaindex.ai/en/stable/getting_started/starter_example_local/

logan-markewich commented 1 month ago

OpenAI has a public privacy policy and terms of service, you are welcome to read that as well if you have concerns