python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.71k stars 2.27k forks source link

{cache-dir}/virtualenvs may be too volatile a storage location for some users #3346

Open hwalinga opened 4 years ago

hwalinga commented 4 years ago

Feature Request

Poetry by default saves its virtualenvs in {cache-dir}/virtualenvs which by default is ~/.cache/pypoetry/virtualenvs (Linux) and ~/Library/Caches/pypoetry/virtualenvs (MacOS). However, these folders are generally considered to be safe to be removed: Linux, MacOS

XDG also specifies to use the cache directory for non essential files. (i.e. those that are trivially recreated without user interaction.)

I report this because I emptied the cache on a server and this dropped the in the background running webserver using a poetry virtualenv, unbeknownst to me (luckily it was weekend).

So I recommend to set a different default for the virtualenv folder. If you follow XDG specifications that would probably be ~/.local/share/poetry/virtualenvs (FYI: Pipenv already uses ~/.local/share/virtualenvs), and there is probably an similar folder for MacOS.

(Default virtualenvs folder for Windows should probably be fine.)

mrijken commented 4 years ago

You know you can change the location, which makes it a different default obsolete?

poetry config virtualenvs.path /path/to/cache/directory/virtualenvs

or to the project / repo location:

poetry config virtualenvs.in-project true

hwalinga commented 4 years ago

Yes, but I found that out when it already went wrong :-) (I wouldn't expect people to completely read the docs, especially for a package manager. So, any rm happy person can make the same mistake as me.)

But it would really be nice to have a more saner default instead. I think ~/.local/share/poetry/virtualenvs would be the perfect location (just as Pipenv does it) and for MacOS that is as it seems ~/Library/Application Support/poetry/virtualenvs or just ~/Library/poetry/virtualenvs

DrLuke commented 4 years ago

(I wouldn't expect people to completely read the docs, especially for a package manager. So, any rm happy person can make the same mistake as me.)

I would at least expect people to try to find out where the venv goes when deploying this to a production environment, seeing how it obviously doesn't end up in the project directory by default.

I agree however that putting them in the cache directory by default might not be the best solution, as that means that your venv could vanish seemingly random at any time you're working with it. At best it's a nuisance because you have to re-run install, at worst this could be rather devastating for someone with a slow or data-capped internet connection.

Personally I would prefer them to always end up in the project directory by default, as that means that deleting the project directory will clean up all data associated with it. If it's at a different location, like ~/.local you could collect a substantial amount of dangling venvs if you don't clean it up regularily.

hwalinga commented 4 years ago

I would at least expect people to try to find out where the venv goes when deploying this to a production environment, seeing how it obviously doesn't end up in the project directory by default.

I learned it the hard way, but agreed.

Personally I would prefer them to always end up in the project directory by default, as that means that deleting the project directory will clean up all data associated with it. If it's at a different location, like ~/.local you could collect a substantial amount of dangling venvs if you don't clean it up regularily.

I think that is also a very good solution, perhaps even better.

hwalinga commented 3 years ago

Some more "inspiration":

https://chriswarrick.com/blog/2018/07/17/pipenv-promises-a-lot-delivers-very-little/#the-break-neck-pace-of-pipenv

felciano commented 3 years ago

This bug just bit me hard on MacOS. Optimization software like CleanMyMac will clear out ~/Library/Caches/ periodically, wiping out packages downloaded by poetry. They've (correctly IMO) pointed to the Apple technical docs that pretty clearly indicate that this directory shouldn't include any files that applications depend on and can't recreate themselves (e.g. "the application does not require cache data to operate properly, but it can use cache data to improve performance").

This seems like a valid and well-documented design decision, and one that poetry should respect out-of-the-box. Since python apps aren't typically expected to be able to re-install required libraries on their own (!), shouldn't a different default cache location be used on MacOS?

YodaEmbedding commented 3 years ago

It depends on the interpretation of the XDG specification:

There is a single base directory relative to which user-specific data files should be written. This directory is defined by the environment variable $XDG_DATA_HOME.

There is a single base directory relative to which user-specific non-essential (cached) data should be written. This directory is defined by the environment variable $XDG_CACHE_HOME.

Is the data essential? It can be reconstructed (by re-downloading) easily, so it sounds rather cache-like. On the other hand, it's a practical annoyance for this data to disappear and force a redownload. Additionally, if the cache is cleared, the user must manually redownload the data, which means it violates the idea that a user should not be able to tell if the cache is suddenly cleared.

TL;DR: storing it in $XDG_DATA_HOME is probably more practical.

YodaEmbedding commented 1 year ago

XDG_STATE_HOME (i.e. ~/.local/state) was recently introduced as an "in-between" option between XDG_CACHE_HOME and XDG_DATA_HOME.

The $XDG_STATE_HOME contains state data that should persist between (application) restarts, but that is not important or portable enough to the user that it should be stored in $XDG_DATA_HOME. It may contain:

  • actions history (logs, history, recently used files, …)
  • current state of the application that can be reused on a restart (view, layout, open files, undo history, …)

Theoretically, anything inside XDG_CACHE_HOME should be regeneratable with zero additional user interaction. Since poetry does not automatically run poetry install when it detects missing virtual environments, there is non-zero user interaction required to regenerate the cache:

rm -rf ~/.cache/pypoetry/virtualenvs
poetry run python main.py  # Does not work without additional user interaction!
poetry install             # Manual user interaction.
poetry run python main.py  # Now it works.

Thus, XDG_STATE_HOME is a more appropriate place.


TL;DR: ~/.local/state/pypoetry/virtualenvs.