zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
7.23k stars 502 forks source link

[Bug]: The MapDataManager function uses pickle to serialize the data_map.txt file which may lead to security risks. #655

Open lihao7212148 opened 1 month ago

lihao7212148 commented 1 month ago

Current Behavior

When MapDataManager is initialized, pickle is called to read the data_map.txt file. If an attacker tampered with the data_map.txt file, this may lead to security risks and the python open source community has stated that pickle is an unsafe function. image

gptcache use pickle code as blow: image

Expected Behavior

expected do not ues pickle or Verify whether the file content has been tampered

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

SimFG commented 1 month ago

That's a good question, could you try to fix that?

lihao7212148 commented 1 month ago

That's a good question, could you try to fix that?

I tried adding the hmac field in the header of the data_map.txt file to prevent tampering, but this method cannot completely eliminate the risk. An attacker may still forge the same hmac data to bypass verification