zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
7.22k stars 502 forks source link

Possible Leakage of Private Info in Semantic Cache Requests #652

Open Unik-lif opened 1 month ago

Unik-lif commented 1 month ago

Dear GPTCache Team, we are a security research group. We've used GPTCache for a while and impressed by its design and speed, but as we studied further, more concerns about the security of GPTCache has arosen. Recently, we found the semantic differences introduced from the private attributes are significant enough to be recognized, therefore, some private info can be inferred by the timing differences returned by the GPTCache system. Assmue the attacker has already guessed the private info right, we found the accuracy of identifying cache hits and misses is 85.3%, with an FPR of 3.2% in a single trial.

Description

GPTCache adopts semantic similarity to judge whether an up-coming request has a semantic neighbor in its dataset (i.e. cache store and vector store), and can return the former answer stored in dataset if the similarity exceeds the threshold. However, private info like the name, email addresses, phone numbers etc in the request can greatly change the semantics. Given two different sentences A and B as requests, A is from the victim, while B is from the attacker: If the private info is the same, while other components of the request is different but semantically similar, chances are that the similarity between these two sentences still exceeds the threshold. Therefore, when B is sent after A, the GPTCache returns the same answer to the attacker at an instant. The hit/miss timing differences are sufficiently pronounced for the attacker to ascertain the accuracy of his guesses.

Threat model

We assume an application is established via LangChain framework, using GPTCache as the backend. The application is accessible to both the victim and the attacker. Requests of different users will be turned into embedding forms and stored in the same dataset. We use ONNX to calculate the semantic similarity between sentences, and assume the victim might ask the same question in different forms.

Environment

GPTCache: v0.1.43

Attack steps

  1. Select a Template which might include private info. Info in brackets [ ] are treated as privacy.

    A: Compose a meeting agenda for an interdisciplinary team discussing the treatment plan for [Name] with [medical condition].
  2. Using the gpt-3.5-turbo to generate plenty of semantic similar templates while keeping the privacy info unchanged.

    Formulate an agenda for an interdisciplinary team meeting regarding the treatment plan for [Name] with [medical condition].
    ...
  3. Fill in the bracket parts of original template and the generated template with the detailed private info. Use the original sentence as the base, filter generated sentences that have similarity over threshold via ONNX.

    Formulate an agenda for an interdisciplinary team meeting regarding the treatment plan for Tom with diabetes.
    ... etc.
  4. Use part of the generated sentences above as the questions that the victim might ask, and the remaining as the sentences attacker have. We consider the questions that the vicim might ask as the positive group, and calculate the semantic similarity between the attack sentences and the positive group. On the other hand, we change the different privacy 'Name' and 'Medical Condition' variable to form negative group.

  5. We draw the ROC curves to demostrate the semantic differences.

  6. We use the orthogonal (therefore attack sentences won't interfere each other) attack sentences to calculate the TPR and NPR.

    Results:

    We use the template: Compose a meeting agenda for an interdisciplinary team discussing the treatment plan for [Name] with [medical condition]. to conduct our experiments.

    • The 'name' and 'medical condition' in the negative group are both different from the positive group.
    • Either the 'name' and 'medical condition' in the negative group is different from the positive group. ROC-curves

The above two graph shows the privacy has introduced significant enough semantics to be distinguished, therefore can be utilized to judge whether the guess is right or not.

#Trials TPR (%) FPR (%)
1 85.3 3.2
2 91.9 3.3

Possible Mitigation:

Detailed information will be provided soon in our paper, looking forward to your reply!

SimFG commented 1 month ago

This is a great research, and I appreciate the interest in the gptcache project. Looking forward to the final paper.