zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
7.15k stars 503 forks source link

[Bug]: Trying to auto install packages during runtime is not security friendly #530

Open kmehant opened 1 year ago

kmehant commented 1 year ago

Current Behavior

GPTCache tries to check if the intended set of python modules exists in the host environment if not it tries to auto install them during runtime.

Expected Behavior

GPTCache should look for an alternative non-runtime based approach which is much security friendly or may be provide an option to toggle this off for downstream packages such as guidance and many others.

In production environments, it is typical that the the environment is hardened like keeping the filesystem read-only etc. As GPTCache tries to install packages during runtime this might break the systems as they dont allow such operations.

Steps To Reproduce

1. Use any downstream package that uses GPTCache such as [guidance tool](https://github.com/guidance-ai/guidance)
2. Observe the logs that it tries to install missing packages

start to install package: redis_om
successfully installed package: redis_om
redis_om installed successfully!


### Environment

_No response_

### Anything else?

_No response_
kmehant commented 1 year ago

Thanks for the great useful project, looking forward to a resolution for this.

bobvanderlinden commented 1 year ago

I'm also running into problems where gptcache tries to install dependencies at runtime. I'd very much like to avoid this on production. It delays the startup of the application and risks the installation (and thus the application as a whole) failing. We're not using Redis, but it still tries to install the redis package upon importing guidance (which uses gptcache). the installation of redis also fails on some of the development machines.

This is also quite confusing for users who are trying guidance using the python interpreter and running into this issue:

>>> import guidance
start to install package: redis

Note that the installation already happens when importing gptcache.utils, so this isn't just a guidance issue:

$ python
Python 3.11.4 (main, Jun  6 2023, 22:16:46) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gptcache.utils
start to install package: redis
successfully installed package: redis
start to install package: redis_om
successfully installed package: redis_om
>>>

There are a number of issues related to failure of installing the dependencies at runtime:

Preferably the optional dependencies would be specified as such. poetry has good support for this: https://python-poetry.org/docs/pyproject/#extras

I have no experience doing the same with requirements.txt, but it seems there is a standard for doing so:

https://peps.python.org/pep-0508/#extras

If I interpret that correctly it should be possible to specify:

redis[redis]
redis_om[redis]

So that people should be able to install gptcache with those optional dependencies using pip install gptcache[redis].

Would that be a good alternative?

SimFG commented 1 year ago

i will checkout it, it's a bad case

$ python
Python 3.11.4 (main, Jun  6 2023, 22:16:46) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gptcache.utils
start to install package: redis
successfully installed package: redis
start to install package: redis_om
successfully installed package: redis_om
>>>
aawilson commented 9 months ago

Thirding this issue, it is a nasty surprise. We saw this behavior during the run of unit tests, which is absolutely the wrong place for a pip install, under any circumstances. The project should rely on setup.py to advertise its dependencies and let pip install, or alternatives, do their jobs, and runtime behavior should be just to bubble up the ImportErrors rather than trying to fix the problem.