ricklamers / gpt-code-ui

An open source implementation of OpenAI's ChatGPT Code interpreter
MIT License
3.57k stars 447 forks source link

feat: Create Individual Virtual Env for the Kernel #74

Open dasmy opened 1 year ago

dasmy commented 1 year ago

This is a follow/up of #73. I originally planned to re-open that one, but this seems impossible. The change addresses https://github.com/ricklamers/gpt-code-ui/discussions/6 and https://github.com/ricklamers/gpt-code-ui/issues/62 - at least partially.

The idea is as follows: We first create a base venv that is not to be deleted and can be re-used easily. There we install all default packages. (Thus, the very first start after applying the change is still slow).

Then, whenever a kernel is invoked, we create another, kernel-specific, venv without any additional packages. Instead, we create a link that points to the packages installed in the base image by installing a .pth file inside its site_packages dir. This way, the derived venv has access to all packages that have been installed in the base image. The Python binary from the derived venv is used to finally start the IPython kernel. The base image is intended to stay immutable and will not be actively used. If an additional package is needed, the corresponding !pip install PACKAGE call can be executed and packages do neither pollute the system environment nor the base venv. All newly installed packages end upon the kernel-specific venv. While they are lost on exit/restart (we delete the derived venv when the kernel exists), the packages inside the base image are not.

The overall idea is coming from https://stackoverflow.com/a/75545634.

An example prompt for this is

Use rdkit to plot the molecule with the SMILES string 'CC1CCC/C(C)=C1/C=C/C(C)=C/C=C/C(C)=C/C=C/C=C(C)/C=C/C=C(C)/C=C/C2=C(C)/CCCC2(C)C'.

I think (!), this could be a good baseline for addressing https://github.com/ricklamers/gpt-code-ui/discussions/67 as well. - At least, running kernels with separate environments are possible this way.

dasmy commented 1 year ago

Hi @ricklamers, answering here to you comment in #73:

Although the .pth approach is somewhat exposed to the user, it does feel like it's breaking a line of abstraction provided by regular virtual environment use (venv or virtualenv). I'm not opposed to it entirely, but this approach could lead to unexpected consequences. For one, I'm not aware what kind of limitations there are for such "derived" virtual environments compared to a "regular virtual environment". Open to exploring this direction! Just stating my initial thoughts

Actually, the approach with the .pth files is properly documented in https://docs.python.org/3/library/site.html : It is a method to extend the list of package search directories and thus allowing to use packages from another venv. - To me it seems to be the least invasive approach and it even seems to be used very similarly by pip when invoking it with --with-system-packages. Then, the system site_packages dir is simply included in the package search path of the created venv. Newly installed packages end up in the venv while existing packages from the system can be used right away.