Add an opt-out for the “running as root” warning

hholst80 commented 3 years ago

What's the problem this feature will solve?

I want to be able to manually remove the warning pip spews out during package installation in root environment:

Running pip as the 'root' user can result in broken permissions and conflicting behaviour ..

Describe the solution you'd like

I want to be able to disable this warning through an environment variable like

env PIP_DISABLE_ROOT_WARNING=1 pip install flask

Alternative Solutions

No in tool workaround known to me.

Additional context

We are all adults here, I know what I am doing and I do not want to see a warning every time I run my build system. Let me disable the warning by setting an environment variable. I do not want my users to think there is anything wrong my system just because of the pip tool spews out indiscriminate warning messages.

Code of Conduct

[X] I agree to follow the PSF Code of Conduct.

pradyunsg commented 3 years ago

This was proposed in https://github.com/pypa/pip/issues/6409, was implemented in https://github.com/pypa/pip/pull/9394 and has been discussed in https://github.com/pypa/pip/issues/10028. There's likely a lot more discussions, but I ain't spending more of my time digging those up.

Quoting myself from https://github.com/pypa/pip/issues/10028#issuecomment-885868343:

I don't think there is a way to make it possible for experienced users to not see this warning while also making sure that it serves the purpose of getting inexperienced users to understand that they should not do this in general.

Do read the linked comment above, before responding.

If you have a response that's different from "I don't like warnings." -> "Gimme an escape hatch." (which will become the top-voted Stackoverflow answer, likely without sufficient context to help inexperienced users) -- color me very interested.

pradyunsg commented 3 years ago

Until something like PEP 668 is implemented and generally available to the point that I am comfortable with dropping this warning, I don't think it is a good idea to provide an escape hatch.

I do not want my users to think there is anything wrong my system just because of the pip tool spews out indiscriminate warning messages.

There is a risk though. If you run sudo pip install that modifies $package and your OS depends on $package, you've quite possibly broken your OS.

I'll note that I'm speaking for myself, and not the other pip maintainers.

pfmoore commented 3 years ago

FWIW, I agree. I think that the warning helps more people than it inconveniences. And anyway, the people who know enough to be sure that they are safe are also capable of suppressing the warning (pip install 2>&1 | grep -v "pip as the 'root' user", for example, although IMO anyone who couldn't have constructed that themselves probably doesn't fully understand the risks of running pip as root...).

hholst80 commented 3 years ago

Suggestion: If the log warning is made into a warning via the "warnings" module I can solve this within the existing framework of that. Or, if that is not feasible, check for /.dockerenv and if that exists do not spew out a warning because it is a hosted container environment and most likely the user is running a python environment based on pip, knows what they are doing, or a combination of both.

https://docs.python.org/3/library/warnings.html

pradyunsg commented 3 years ago

https://github.com/pypa/pip/blob/a07bfb33a0df3807ce2f71563b27993e97682e47/src/pip/_internal/cli/req_command.py#L179-L184

It's done through the logging module, although, we might change how we output things in the future. :)

potiuk commented 3 years ago

Yeah. Checking if you are in container environment would solve vast majority of problems of people who want this warning removed. I think "running root in container" is equally good reason to skip the warning as "running root in cygwin" or "running on windows". Happy to make PR if I know there is a consensus for that one.

potiuk commented 3 years ago

Or maybe even just changing the message to say "If you are in container, it's usualy OK to run pip as root". That woudl be more "factual".

pfmoore commented 3 years ago

Please, can we not have this debate again? Others have already proposed "disable the warning if you're in a container" and we've responded to that (no, the warning is still valid there - I'm quoting others as my personal experience with containers is limited, so don't bother trying to engage me in debate over this). Repeating arguments that you could have found by searching the tracker for previous discussion on this topic isn't likely to change anyone's mind here...

potiuk commented 3 years ago

Well. I do follow that discussion and I have not noticed that. Really sorry I should have checked more carefully.

But I have not seen anyone propose better error description, one that might help people who wonder if the warning is valid for them or not. I think you should be empathetic towards people who have their own users, and have to continue explaining them "Yeah the warning is there, but this is container so this is right". How about explicitly adding explanation that in container it is likely ok ? Still warning, a more reasonable message actually reflecting the reality. What's wrong with that?

pradyunsg commented 3 years ago

It is still possible to modify system-package-manager installed packages, using pip inside a container. That can still break things in weird ways.

PEP 668 will bring in the protections necessary, so if someone really wants to get rid of the warnings, it'd be more impactful to help that effort move forward. You can still try to convince us that the wording should be tweaked or the message should have additional conditionals, but don't be surprised if I'm responding on the PEP 668 discussion and not here. :)

As it stands, there's a risk to running pip as sudo regardless of whether you run it on your local terminal, in a container, or on a remote machine. Outside of mitigating that risk (part of which is done by PEP 668), all that we can do is warn users about it; and that's what this message is doing.

pfmoore commented 3 years ago

I think you should be empathetic towards people who have their own users

That is a valid point, and I'm sorry for not considering it. Do you supply a copy of pip with your application? If so, then you can wrap it to suppress the warning. If you don't, then I'm not sure how you can be so sure your users are using sudo pip safely? I guess if they are following instructions you provide on how to set up the container, you can know they are not doing anything unsafe, but then why not just add to your docs a note that pip issues a warning that doesn't apply for people who are following this particular set of instructions? After all, if they are not reading your docs to see that note, they probably aren't setting up their container the way you advise them to either!

I hope this helps.

potiuk commented 3 years ago

That is a valid point, and I'm sorry for not considering it. Do you supply a copy of pip with your application? If so, then you can wrap it to suppress the warning. If you don't, then I'm not sure how you can be so sure your users are using sudo pip safely.

Just to explain my case.

Yep I am sure it is ok. This is because we have our own Dockerfile https://github.com/apache/airflow/blob/main/Dockerfile which is very versatile and you can build your image using custom docker build . commands providing multiple arguments:

https://airflow.apache.org/docs/docker-stack/build.html#examples-of-image-customizing

For example you can build custom Airflow image like that:

docker build . \
    --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-buster" \
    --build-arg AIRFLOW_VERSION="2.0.2" \
    --build-arg ADDITIONAL_AIRFLOW_EXTRAS="mssql,hdfs" \
    --build-arg ADDITIONAL_PYTHON_DEPS="oauth2client" \
    --tag "my-pypi-extras-and-deps:0.0.1"

But currently while doing it you have quite a few root warnings. This is not a deal-breaker though. If you have an underlying PEP 668 and clear way how to solve it in the future, I am quite ok to wait and explain the users this is fine. But I find it difficult to accept "turning a blind eye" on such use cases.

If the warning is going to stay there forever and there is no solution to solve it in the future, then I'd really appreciate a bit empathy and understanding and at the very least acknowledging and mentioning that there are cases that are valid so that your users do not have to explain their users "yeah ignore that - those guys are just over-protective and the warning really makes no sense, however I have no way to disable it in a reasonable way".

potiuk commented 3 years ago

And surely I could "grep-out" that message in my image.

But the message has already changed several times so I would never be sure it that grep continues working. We are continuously updating to latest versions of PIP as soon as it is released but there are (already were!) cases where PIP version resolver broke dependency resolution (hey this is Airflow with 500 dependencies). So one of the options of the image customisation is also choosing the pip version: https://airflow.apache.org/docs/docker-stack/build-arg-ref.html#basic-arguments - just in case users will have to use previous version of PIP - and there, the message could be different.

Again - this is not HUGE problem - it's annoyance, and we can definitely live with it for a while but I just wanted to explain that we are not just "moaning" - this is a real use case, real problem, real user annoyance and the workarounds suggested (like greping out the message) are band-aid at most.

uranusjr commented 3 years ago

Until PEP 668 goes somewhere I don't thinkwe have a choice. You think this is an annoyance, but distributions also hate people writing to their package store (yes, even in containers, because the distro doesn't realy have knowledge it's in a container), and pip is stuck in the middle of that tension.

potiuk commented 3 years ago

Sure. No worries. I understand the "pressures". As long as there is a long-term plan how to tackle it, this is perfectly fine to continue this route.

Cougar commented 3 years ago

I hit the same warning today (also building a docker image for system use). I see two related problems here. First, it is RED and not yellow or any other less dominant color. Another thing is that it is not a good practice for anyone to get used with errors. One day you'll miss some important error because your brain just ignores it.

uranusjr commented 3 years ago

For what it's worth, we want users to not run pip as root, not get used to the error.

Cougar commented 3 years ago

You have to run it under root if your target is to install the same packages for all users in that particular image or server. Is there any (good) alternative for that? Let every user install the same packages and versions under their venv and every time update all these vnevs together? Or make one global venv for all users which is not much different than install under root in the first place.

uranusjr commented 3 years ago

To be clear, the error only appears when you run pip as root, directly on a Python installation. If the goal is to provision the installation across users, it'd be best to use a virtual environment instead. That is enough to suppress the message. And before we go there, yes, we do think it is still best practice to use virtual environments in a container.

Cougar commented 3 years ago

Do you mean an virtual environment like python -m venv .venv and source .venv/bin/activate?

Alpine based Python 3.10 Docker image is 45.4 MB. If you set up a virtual environment, you just duplicate 1/3 of of data (+15 MB).

What is the rationale behind that? I'm very strong proponent of virtual environments and use it as much as possible but I don't see the point here. It sounds like if all you have is a hammer, everything looks like a nail - like virtual environments, hammer is also very useful tool.

pfmoore commented 3 years ago

What is the rationale behind that?

Ultimately the rationale here is that we have had very strong representations from Linux distribution vendors saying that they do not want people using pip to install Python packages into the area owned by the system package manager. Pip is very much "piggy in the middle" here - we cannot win, as we have conflicting demands from two key parts of our user base.

PEP 668 is the long term solution here. In the short term, it seems pointless to change something in pip just so that the other half of our user base will be yelling at us 🙁

potiuk commented 3 years ago

I actualy sympathise with all 3 parties there:

similarly as @Cougar I do not want to use virtualenv for Docker image building. Not mentioning the "alpine" image growth, It makes little sense, complicates making stuff like copying installed --user python installation between segments for multi-segmented image to make image even smaller (this is what we do in Apache Airflow for example: https://github.com/apache/airflow/blob/e5422f0233b993acfe7c881dfa72178e662f8e46/Dockerfile#L444 - unlike using --user flag (or other flags to install stuff elsewhere) doing that wit venv is brittle and not guaranteed to contain all required libraries
I understand the problem of Distro people
And I also understand PIP people in the middle of that and I perfectly understand PEP 668 is the right solution - we discussed it before above and I am perfectly fine with it.

On the other hand we do not know when PEP 668 is going to land - and I also agree with @Cougar that false negatives which cannot be easily disabled is a wrong thing.

However - looking at the discussion above I think there is one thing that CAN be done that will satisfy everyone here. Sort of win-win-win situation

@pfmoore - you mention that the rational is that Linux distro vendors are saying they do not want people using PIP to install Python packages intto the area owned by the system package manager. Similarly @uranusjr mentions that introducing venv disables the warning. However for building image case introducing venv is NOT a good solution. On the other hand using --user flag or (similarly) using --target is much better and straightforward solution for Docker image building.

Also, coinciendently it happens that both --user (and --target flag if target is not using the system directories) also should not make the distro people angry. Because it does not touch the files they are worried about.

So what I really think is GREAT solution for everyone - if that this warning (in RED) is NOT printed if the --user flag is used or --target flag does not point to any of the distro "sensitive" directories. I think simply that using "DO NOT USE ROOT" as a message in this context is simply wrong. The message should be "DO NOT OVERRIDE SYSTEM PACKAGES". And both --user and --target flag should be tretated as "perfectly OK" when run as root.

Is there any drawback to this proposal? Maybe I have not thought about something, but It seems we have a very easy solution that satisfies everyone in the discussion and we do not have to wait unti PEP 686 materializes.

pfmoore commented 3 years ago

I thought the requirement was "to install the same packages for all users" (see above). --user won't do that. Also I don't know what the root user's home directory is, so I can't say it's OK. --target sucks, because it doesn't support upgrading, and it has a load of weird edge cases. It's not designed for this situation, and we'd probably get the problems reported as bugs if we started recommending it.

And in any case, without PEP 668, we don't know what are "distro-sensitive" locations, so how would we confirm that?

The drawback is the same as always - it doesn't fix the issue, it just changes the group of people who complain at us.

potiuk commented 3 years ago

I thought the requirement was "to install the same packages for all users" (see above). --user won't do that.

Well. Actually this is precisely what --user flag allows when it comes to container images (and we are successfully doing that for more than one and half year in Apache Airflow). The --user case is simply very close to he (recommended by PIP maintainers) venv but better.

It creates a separate, isolated environment where we have not only all packages installed but also all the '.so' and other dependencies installed in one 'folder' that is easy to copy and use. And we can easily make it 'local" for any user running the image, which effectively allows "to install the same packages for all users" (https://github.com/pypa/pip/issues/10556#issuecomment-945960306). This is precisely that we do in Airflow image - and its not our isolated case - we are simply following OpenShifft recommendations for images (https://docs.openshift.com/container-platform/3.11/creating_images/guidelines.html - look for "Support Arbitrary User IDs". Our image sets the same "HOME" directory for EVERY user. This means that ".local" directory is THE SAME for every user. And it means that literaly "we install the same packages for all users"

--target sucks, because it doesn't support upgrading,

I quite agree with that - that's why we use --user for that (and super-happy with how it works). I am perfectly OK to drop --target from my solution. Leaving only --user flag (i.e. do NOT print the warning when pip --user flag is used. full stop).

I believe (please correct me if I am wrong @pfmoore) that it will not "write to system packages" - so the properties of that solution are :

Distro people are happy as --user flag excludes writing to system packages
PIP People are happy. Simply using venv already has this property, so making --user as an exclusion should be ok as well
Docker-centric users are happy - they can use --user flag to install airflow in non-system place, at the same time making it available for re-use for all users, following the best guidelines out there

Did I miss something @pfmoore ?

pradyunsg commented 3 years ago

Running --user on most systems will install into ~/.local, which for the root user is /root/.local/. I'm pretty sure that's not what folks want when they say "shared across users". They want to put it in a global environment.
--user is NOT a virtual environment, and doesn't have one of the more important properties of a python -m venv .venv style environment -- isolation from the system. If you try to pip uninstall six from a virtual environment, pip won't try to uninstall six from the global environment (it'll say "Not uninstalling six at {path}, outside environment {venv_path}"). Same for pip install --upgrade --user.

I think simply that using "DO NOT USE ROOT" as a message in this context is simply wrong. The message should be "DO NOT OVERRIDE SYSTEM PACKAGES".

The exact warning is:

        "Running pip as the 'root' user can result in broken permissions and "
        "conflicting behaviour with the system package manager. "
        "It is recommended to use a virtual environment instead: "
        "https://pip.pypa.io/warnings/venv"

It's neither all-caps, not a blanket message to do anything. It communicates that there's a risk, and recommends a way to mitigate that risk.

The problem at hand is things like sudo pip usage, as well as any usage that could modify system packages and interfere with the OS packages. This is at odds with users in Docker being root-by-default and some users not wanting to do anything to avoid modifying the system packages. Or that users who want to install into a global environment that's shared across users get this warning. Using USER with Docker is explicitly listed as a best practice for Docker environments.

Note that Docker/container environments are NOT the only usecase here. There's significant portion of users on other Linux-based platforms, who face the same issue, where messing up the system packages with pip can mean that they're unable to use their PC after a reboot. This message is currently nudging both user personas to a best-practice that can help reduce risks.

potiuk commented 3 years ago

Apoliogies for "all caps". It was more to emphasise the meaning the message brings (making root as the "root of all evil" and the only possible and recommended way of handling it being virtualenv - but I understand it could be understood as me shouting. Lesson taken. I will avoid all caps.

Seeing how strong the PIP maintainers are opposing any proposal to improve their message and possibly even educate their users "better" I kind of lost hope that it will get any change of improvement. It's a bit sad on one hand, and I would have understood it if it was a huge investment and big diversion from current policies and work, but I am not sure this is the case.

Therefore I treat this more as an educational discussion - where I (and others looking at it) might learn what are the deep "root" reasons for the message (I think it is not really clear from the message that it has been driven by distros) and how it relates to the in-docker experience (which I think most of the proposals to improve the message come from). Also I see that as a chance for PIP maintainers to learn some way their software is used in legitimate (and useful) ways.

I always try to hear to my users at Apache Airflow and even if I see that they are using it in different way than I originally anticipated, I try to be open and at least make life easier tor our users if it costs us very little. Improving error message, making it clearer, and responding to the needs of our users who have some legitimate doubts has been something I was doing for many months now (which resulted in many improvements to our docs and messages printed). But there are of course cases where I hear, listen, acknowledge that there are some good reasons why our users want something that they will not get, so that's ok for me if it will remain how it is for now (though i still think educational part of the discussion is not exhausted yet, so I will add some more context and explanations. Maybe eventually it will lead to at least better understanding of the problem at hand by all parties (and possibly it can even lead to better PEP 668 implementation - who knows if PEP 668 will be equally good for this kind of contenerised environment that are now prevalent in K8S-driven deployments..

Running --user on most systems will install into ~/.local, which for the root user is /root/.local/. I'm pretty sure that's not what folks want when they say "shared across users". They want to put it in a global environment.

1) Running --user on it's own does not make the software available to many users on it's own. This is true. However with Container and Kubernetes (and especially in case of OpenShift which pioneered that approach) there is a case where single Home directory can be shared by many users. This is what (https://docs.openshift.com/container-platform/3.11/creating_images/guidelines.html - look for "Support Arbitrary User IDs".) comes into place. In K8S environment one of the best patterns is to allow arbitrary users (belonging to 0 group) to run inside container. This has multiple advantages and it is far superior than plain USER directive in Dockerfile (althought it can happily co-exist with USER directive). The USER directive is from pre-K8S times and it is very limiting because by default in "linux" environment it requires the user to be available on both Host and Container if you want to share data between the host and containers. This kinda break isolation between the two and the approach promoted by OpenShift fixes that. After working for many years with Docker/Containers/K8S I foudn the OpenShift approach both simple and powerful.

Using USER with Docker is explicitly listed as a best practice for Docker environments.

2) True that USER directive is recommendended for running the container. And to be honest - this is precisely what we use in Airflow: https://github.com/apache/airflow/blob/main/Dockerfile#L461

However this is pretty old recommendation that has already been (partially) invalidated in a specific case - namely multi-stage builds (Docker recommendation here: https://docs.docker.com/develop/develop-images/multistage-build/) which have been implemented way after the USER directive was introduced. The multi-stage builds is the practice we use at Apache Airflow as well, in order to significantly decrease the size of the image. We simply install all our PIP dependencies and libraries (using --user flag) to a "/root/.local" directory in the "build" stage and then copy the entire directory (with all the resulting libraries and python packages) to the "final" stage. This allows us to save at least 25% percent of the size of the final image (we do not need build-essentials and a lot of libraries in the "final stage". It follows all the best practices of Docker image building. Those practices are also such that no "USER" directive is needed is the "build" stage. It's much beter if everything is run as root user here. There is no need to use sudo you are installing everything as root user and you do not need to add extra steps to create a separate user, simply because this stage is only used to build the artifacts that will be copied to the final stage. No danger involved, very simple and straightforward if we use "root'" user for that.

The exact warning is:

    "Running pip as the 'root' user can result in broken permissions and "
    "conflicting behaviour with the system package manager. "
    "It is recommended to use a virtual environment instead: "
    "https://pip.pypa.io/warnings/venv"

I see a seriousl problem with that message. It's misleading and confusing.

It informs the users that "root" is the "root of all evil". On the other hand the remediation, does not even mention "run PIP as different user". Instead it mentions "use virtual environment", wich kind of contradict the problem statement. Is "using root" a problem? Or "not using virtualenv" ? Or both. It's not clear from the message also that this breaks policies of various distros.

If we agree (I have not seen any argument against it so far) that --user handles the problem with "broken permissions and conflicting behaviour".

How about a little improved, more precise and more "factual" message

     "Running pip as the 'root' user can result in broken permissions and "
     "conflicting behaviour with the system package manager which is"
     "against the policies of many distributions of Linux."
     "There are several ways it can be handlied:"
     "  * use a different user to run PIP than root"
     "  * use virtual environment https://pip.pypa.io/warnings/venv
     "  * use `--user` flag"

I think that - or similar - kind of message would be much more precise, propose several different solutions to the problem and explain also a bit more context on why the message is there in the first place.

pfmoore commented 3 years ago

While I appreciate the extensive reply, you're addressing the wrong audience IMO. You need to persuade the people (most notably the Linux distro maintainers) who support the current behaviour that it should be modified as you suggest, not the pip maintainers (who are looking for consensus, not a competing proposal). Those people were involved in developing PEP 668, so they understand the background well.

potiuk commented 3 years ago

I was under the impression (please.corrct me if I am wrong) that the distro people have not decided on the exact message and conditions when it is printed.

I believe the ask was (i would really like to understand that) 'warn when there is a risk of modifying system locations' and not 'print message when you use root'

Is that message something explicitly requested by distro people ? Or is it something that PIP maintainers decided about (i.e. condition and warning content).

What your answer suggest is that PIP maintainers have no power to control their messages to their users and no power to correct them if they are misleading which I find pretty confusing and hard to believe?

But if that's the case and we need permission or opinion from the distro people - whom can we mark here so that they can have a say here (if the opinion of PIP maintainers is not enough)? I am happy to drag them into this discussion. I personally am in favour of bringing in other voices to the discussion - especially if it seems that those 'others' have a final say here.

pfmoore commented 3 years ago

Sigh. This is the last time I comment on this issue. That's not what I said or meant. I said we're implementing something that one part of our user base supports. Another part of the user base (so far, two people on this thread) have said they disagree. Unless the people who support the current wording weigh in to say they support a new wording, we're not going to change and risk just annoying a different part of our user base.

We're not experts in this matter (I'm definitely not, as my main platform is Windows) so we rely on the expertise of others. When people claiming to be experts disagree, what should we do? I say, stick with what we have (which includes a longer-term "proper fix"). We don't need more churn for our users.

But as I say, I'm done with this. It no longer feels like a discussion, and I'm either not making my point or I'm being deliberately misinterpreted, and in either case I'm not adding any value here.

notatallshaw commented 3 years ago

I can't speak for others but I was a bit confused by your response @pfmoore and had to re-read the message a couple of times and go back to the original issue to understand things, I definitely had some misunderstandings about the motives of the Pip maintainers for putting this message there and that's what caused the confusion.

My interpretation now is as follows:

The message that pip is giving comes from the guidance of Linux distro maintainers. To change it the pip maintainers would want consensus from that group that the new message works for them, as this is outside the expertise and knowledge of pip to dictate.

Looking at @potiuk 's messages I think the proposal is to add additional information on what might be possible solutions if you run under root.

I would note that in the original issue https://github.com/pypa/pip/issues/6409 that the --user flag (one of the possibilities that is being proposed to include in the error message) is indeed mentioned as a possible suggestion. So it may not actually be too hard to achieve consensus on some additional suggestions to this message, but I do not know how one would go about contacting and getting consensus from Linux distro maintainers.

potiuk commented 3 years ago

I also think @pfmoore you might have taken it a bit too personal. What my intention was is not to twist your words, but to make sure I understand it - so i paraphrased it with my own words and understanding. This is very typical approach in any kind of discussion - where you want to make sure you understand the intention and interpretation of the other party

My intention is simply to make sure I understand how to proceed further without further confusing my users by misleading message (because I think the current message is misleading). Even if we do not agree with adding '--user' as an explicit exception, i think the current message stating that 'using root is wrong' and 'using venv is solution to that' is confusing like crazy because the cause has (without going into details and reading the whole PEP) nothing to do with the solution.

I think we either should explicitly state all solutions or clearly state that the problem is not using venv in the first place. Confusing 'using root' and 'using venv' is i think the root cause of the whole discussion we are having here.

I just re-read all PEP 668 and the discussion that led to it. And I understand all context better. I understand (again - this is my paraphrasing and understanding of it) that the intention of PEP writers was to 'gently' guide users into using venv while acknowledging that there are cases where it is actually not always the 'best' solution. There are some exceptions explicitly stated in PEP - like Sphinx extensions that should be system-wide for example and special treatment of containers.

But by re-reading that I also realized that there is one problem with PEP 668 that actually undermines a basic assumption in the discussion above - namely that PEP 668 will solve the Docker container issue problem when implemented by distros. Which I believe is a false assumption. From the PEP:

Distros that produce official images for single-application containers (e.g., Docker container images) should remove the EXTERNALLY-MANAGED file, preferably in a way that makes it not come back if a user of that image installs package updates inside their image (think RUN apt-get dist-upgrade). On dpkg-based systems, using dpkg-divert --local to persistently rename the file would work. On other systems, there may need to be some configuration flag available to a post-install script to re-remove the EXTERNALLY-MANAGED file.

How i interpret that: Even if PEP 668 is implemented by both Pip and distros (debian in our case), the distro which is base for the container build will not have the EXTERNALLY-MANAGED marker. Which means that in all cases where i am building an image based on base distro image i will still get the warning when running as root and not using venv. This is how I read that paragraph (also because it is explicitly stated in the same PEP that if EXTERNALLY-MANAGED marker is missing - pip will behave as before the PEP implementation.

Few questions:

Is that correct interpretation?
What will then be suggested approach to solve it long term (i.e. How to use PIP without venv in a docker container so that no warning is generated)?
Does it mean that Pip forces us to use venv if we want to get rid of the warning?
Or maybe using pip --user with non root user (and no venv) will also work ?
Should the warning suggest both solutions as 'correct' if that's the case ?

@pfmoore - please don't get me wrong. I am not trying to stir the waters and i am not trying to undermine anyone's positions or 'twist someone words'. I really want to understand why are the limitations. Make sure that my users get error messages that are factual and consistent (and actionable) and that i have a long term solution which i can apply - understanding the reasoning and context. I think PEP 668 is very well written, it takes a lot of things into consideration including the '--user' flag and containers. But - as usual - there might be other findings that could be discovered after the PEP is written and some particular cases that need to be treated differently (or even some things are not ultimately specified in the PEP).

While I understand your gravitation towards venv, i think even the PEP itself is all about guiding people in it's direction, not forcing it. And while i perfectly understand why Warning is fine for the guiding, even the PEP itself acknowledges there are exceptions. So having a solution for such exceptions especially that it seems that it won't be solved by PEP is i think rather reasonable ask.

All I am asking for is to be treated seriously.

I look for a rational discussion on how to solve real problem. I think I tried to be nice and polite a and tried to put forth some context and facts that might not have been known or realized before. I try to get to the point where different parties are heard and possibly even their opinions and problems are considered and maybe even addressed if reasonable and easy to do. I put some proposals which I think it's worth to comment on rather than abruptly stop discussion for - apparently - no reason.

That's all I am asking for.

pradyunsg commented 3 years ago

As a general note around GitHub ettiquete, if someone explicitly states that they're stepping away from a discussion (which implies that they're unsubscribing), I think it is a bit rude to @-mention them in the same thread (which both subscribes them and triggers a notification) in a somewhat immediate follow up.

You wouldn't drag a person back into an in-person discussion, if they say they need to step away for a bit -- at least, I hope so. Don't do that in a digital space either. :)

potiuk commented 3 years ago

You wouldn't drag a person back into an in-person discussion, if they say they need to step away for a bit -- at least, I hope so. Don't do that in a digital space either. :)

Sorry - my bad, won't happen again. I also apologise for a few mistakes and typos - I wrote it on my phone while on holidays (I corrected them as I am back at my PC).

I think my proposal is not "against" the spirit of PEP 668 - I believe it improves the message to be consistent and (possibly, if others agree this is is in-line with PEP 668) add a non-root --user flag that can be used inside the containers without raising a warning (nor using venv unnecessarily).

Following the advice however - I would really want to hear what other creators of PEP 668 think about it, And I would love to get answer to the question whether PEP 668 actually solves the problem of having a warning in container. @pradyunsg - I see you are one of the creators, I also @uranusjr (I do not want to call others who did not take part in the discussion unnecessarily - I hope they see it and will respond if they have an opinion).

Just to reiterate where my understanding of the current state of the discusion is:

I think the error message is misleading - it mentions "running as root" where solution is "using venv" which (without reading the whole PEP) makes little sense as the solution has nothing to do with the problem seemingly. My proposal is to improve the message to be consistent and mention other options of getting rid of the warning (depending which ones will be valid) : for example if runnng pip install with --user as non-root (especially when in-container) will remove the warning. it seems like a good idea to mention it, not only the venv one.
seems that (I would love to hear comments) contrary to earlier assumptions in the thread implementing PEP 668 is not a solution for in-container builds. The recommendation for distros is to remove EXTERNALLY-MANAGED marker in base container images which (as I undersand it) will keep the warning (unless I use venv). In some cases using venv might lead to huge increase in the size of the image (+30% in case of alpine image - see https://github.com/pypa/pip/issues/10556#issuecomment-945973598) so possibly venv should not be the only solution to get rid of the warning - even if it is the preferred one.
I do not think I am violating the spirit of PEP 668 by proposing to improve the error message and remove warning pip --user for non-root user is used. I do not see those as "against" each other rather than "fulfilling the spirit" in case of container builds.

Could others comment on that - do I understand it correctly? Is there another "jumping to conclusion" which I did without being involved in earlier discussions?

hholst80 commented 3 years ago

so possibly venv should not be the only solution to get rid of the warning - even if it is the preferred one.

Watch out for bias. It is defensively not the preferred solution in a container environment, and I would go so far to say it is a direct anti-pattern to do so, in a container environment.

potiuk commented 3 years ago

so possibly venv should not be the only solution to get rid of the warning - even if it is the preferred one.

Watch out for bias. It is defensively not the preferred solution in a container environment, and I would go so far to say it is a direct anti-pattern to do so, in a container environment.

I personally quite agree. As I was reading the PEP668, the linked discussion and got comments here i believe the problem is that the 'consensus that venv is the solution' did not take into account image building (not even running python on container but image building which is quite a different thing and very important general use case for PIP).

I would really appreciate PIP maintainers here to give us a hint on how to solve the conundrum (as I understand it now) of false warning when PEP 668 explicitly recommends removing the marker - which results in the warning, forcing people who prepare images to use this Antipattern you mentioned. Or maybe explain if I'm wrong and PEP668 will actually give us the opportunity of removing the warning while not having to use venv.

This has been mentioned before and It was explained that we have to wait for PEP 668 (and I was quite ok with that) but seems like now we have no clear solution for image building even after PEP 668 is implemented.

I'd love to hear a comment her (maybe I am just wrong in my assessment which I would also love to know about).

uranusjr commented 3 years ago

I would go so far to say not using a virtual environment in a Docker container is an anti-pattern. The difference between a Python in and out of a virtual environment during image building is also minimal; the only difference is the command you use to invoke Python (and executables installed under that Python). PEP 668 does not get into this too much because this is a topic people have strong opinions on, and discussing the topic presents an unnecessary risk of derailing the PEP discussion, but if you read between the lines, installing things to the system Python (in a container or when building an image) is more like a thing tolerated by the mechanism since way too many people are doing it, rather than something the PEP authors (well, at least I) think is too brilliant for the PEP to break.

potiuk commented 3 years ago

If that's the position of the PIP maintainers that venv is the ONLY way, and forcing venv is mandatory, then i think we have no other way than to follow it, because seems that decision is already set in stone and no matter how many arguments we throw they are not strong enough to change the mind of PIP maintainers.

But I think if such decision is made by PIP maintainers this should be very clearly and straightforwardly stated in the error message.

I think the message should be in this case ' you are not using venv but this is wrong. Please use venv: link to the docs'. I understand (again I would like to make sure that I understand it right) that it has nothing to do with using or not root user? Or am I wrong and the only proper way is both 'not using root' and 'using venv' ? And all the other ways of using PIP are simply 'wrong'?

Could you please clarify if i understand it correctly ?

Also i think PIP maintainers should understand the consequence of that - I understand then that venv is the only way I can get rid of the warning. This also means that people using alpine will have to pay the price of having much bigger image - because they have no choice. In case of Debian/Ubuntu it is far less of a problem (and probably neglectible). Also i know for a fact (there are many articles about it and I had the same problems) that alpine image is quite bad for any serious Python installation (because of it's musl library and still some incompatibilities between libc and musl). So i would understand if this is a deliberate decision: ' yes, we know that venv will increase alpine's size significantly but since it's support for python is problematic anyway, this is a conscious choice made and we realize the consequence when making this decision'. I think it would be great to make it clear in the PEP or other accompanying discussion (maybe simply we can link that discussion to PEP as follow-up discusion so that others can find it and understand that this decision was conscious and taken deliberately and not a mistake)

However I need your help as well to understand how to do what I did with --user (which I understand is also 'wrong' way of doing it). I would be happy to follow the venv route in Airflow image but I am not sure if I have the same guarantees.

When I use '--user' flag, i am 100% sure (this is clearly described in the docs) that all the dependencies (including the .so runtime libraries, any local files etc) are placed in '.local' directory and copying it to another image/user is going to work. I do it for more than two years now, and yeah - it works as expected. I am happy to change it to use venv but now I need some answers:

Can I do the same - copy .venv dir to another image and it will work (assuming that I make sure I will activate venv in .rc file for all users)? Will it work for all the shared .so libraries that are built along the way for packages that need compilation and will I only get runtime version of those libraries ( i do not want the dev libraries as they unnecessarily - and a lot - increase the size of the image)?
Can I use 'root' user to create the venv or should I create a new non-root user for it (i want to make sure that when I change it now, it will continue to work also after PEP668 is implemented)?
In case of OpenShift- compatible images we need arbitrary users to run in the image - all of them should use the same venv. I will create the venv with the right umask so the group has write access for all users, but my question: is it enough to use the same home dir for all the users ? Will .venv folder and running 'activate' in .rc files for arbitrary user work ? Any side effects ?
The most important problem (i currently have no good solution to that). How can I make sure that the venv is activated when my user 'extends the image' ? In Docker, every RUN command is run in separate shell. So even if you run 'activate' in one RUN, the next RUN in Dockerfile does not have environment variables defined that usually are set when you run 'activate' (basically source command does not work in Docker RUN command the way you are used in terminal sessions). Usually the users extend the image in this way (and this is what we have as examples in airflow documentation: https://airflow.apache.org/docs/docker-stack/build.html#adding-a-new-pypi-package

Dockerfile:

FROM apache/airflow:2.2.0.dev0
RUN pip install --no-cache-dir lxml

Our users will simply do that because this is how everyone is used to install packages. In this case the RUN command will not have the venv activated and i see no easy, future-safe way of doing it.

We cannot just instruct people to prepend activation for every RUN pip command they run - the experience is that big part of the users will not read the documentation and will use the most obvious way. This is also the reason why wy have PIP_USER variable set so that the 'plain' PIP command will automatically install packages with '--user' flag. Before that we had plenty of issues where people did not add '--user' flag which we had in the docs and ended up with broken installations where they had the same packages installed in multiple locations.

I would love some guidance on those questions before i follow the venv route.

It

potiuk commented 3 years ago

And just to add - the above comment is not 'ill meant' or angry.

If passing the message between the lines is the way PEP is written is deliberatly chosen by PEP writers because they wanted to avoid those kind of discussions anf giving all answers upfront (and moving forward with the PEP) - i can only sympathise with that, however you will not run away from those discussions as long as there are some good solutions for people and projects like ours - where we have to face some real consequences of those decisions.

I am happy to be the guinea pig to help you to move it one step further and make a showcase that yeah - venv approach is also applicable to Docker image building case. But I need help and answers - i am capable and experienced enough to discuss it and implement it and solve (together with you) any problems - but I simply need help and clear understanding that I know what I am doing and some kind of guarantee that it will not break in the near future.

Then we could even progress to the next stage ( and maybe even next PEP) where we - together will not have to 'write the message between the lines' and 'implicitly' but maybe write it 'explicitly' following the Python zen. I am all for promoting venv as the only way as long as we know we can handle all cases this way.

pradyunsg commented 3 years ago

Alright, groking through this discussion now. Let's see what happened here, as I was actively avoiding engaging with this during my fairly-heavy work week.

So...

Or maybe even just changing the message to say "If you are in container, it's usualy OK to run pip as root". That woudl be more "factual".

Lead to...

If that's the position of the PIP maintainers that venv is the ONLY way, and forcing venv is mandatory, then i think we have no other way than to follow it

I disagree with the first statement, and also the second. I think the latter is a gross exaggeration and taking a carefully measured position too far in one direction or the other.

Here's my personal thoughts on this thing, presented in the form of quotes from earlier in this thread, because I don't wanna rewrite these things:

There's significant portion of users on other Linux-based platforms, who face the same issue, where messing up the system packages with pip can mean that they're unable to use their PC after a reboot.

It is still possible to modify system-package-manager installed packages, using pip inside a container. That can still break things in weird ways.

As it stands, there's a risk to running pip as sudo regardless of whether you run it on your local terminal, in a container, or on a remote machine. Outside of mitigating that risk (part of which is done by PEP 668), all that we can do is warn users about it; and that's what this message is doing.

I don't think there is a way to make it possible for experienced users to not see this warning while also making sure that it serves the purpose of getting inexperienced users to understand that they should not do this in general.

Aside: I already frown a bit every time someone uses "Pip" rather than "pip" or pip, so... I definitely don't like "PIP". 😅

PEP 668 does not get into this too much because this is a topic people have strong opinions on, and discussing the topic presents an unnecessary risk of derailing the PEP discussion

Wait, I'm actively pushing to make explicit recommendations for include the PEP's protections containers in the PEP. They won't be requirements, but they'll be a normalative recommendation. See the discuss.python.org thread for what exactly I've said.

Or... are you talking about do-not-run-as-root recommendations in the PEP? If so, yea, that's unrelated to the PEP entirely. I don't think there's anything between the lines about that.

We cannot just instruct people to prepend activation for every RUN pip command they run

You don't have to. venv/bin/pip does the right things.

Beyond that, I'm finding it difficult to follow what @potiuk said in his recent posts, which seem more emotionally charged than earlier ones; so... I'm going not respond to them. I would like to note that nothing super disruptive is changing anytime soon, so it's probably fine to come back to this discussion in like, a week or two / a month from now -- let's give people a bit of time to relax and tone down the conversation here.

potiuk commented 3 years ago

I think (hopefully) it is not too emotional any more - I am really i b the state on how I can put the recommendations of always using Venv. I am still not ok withe the error message but i would like now to focus on how I can actually put 'always use venv' in practice. I'd love to focus on the technicalities without diving into emotions.

Side comment - i use correct spelling for pip now. Honestly thank you for expressing your emotions connected with using different spelling. Only because you expressed your emotions connected to that I had a chance to understand that and empathize with it. I would not have known otherwise.

We are all humans not robots - we do have emotions and i think it's ok to tell others how we feel so being emotional (in terms of expressing your emotions) is often much more important than hiding it because this might lead to misunderstanding. That's why I also not shy away from expressing my emotions usually.

make explicit recommendations for include the PEP's protections containers in the PEP. They won't be requirements, but they'll be a normalative recommendation. See the discuss.python.org thread for what exactly I've said.

Do i understand correctly that the PEP 668 is going to change when it comes to containers ? Can you please copy the right link ? The link you copied is just generic link to the 'discuss' site.

Or... are you talking about do-not-run-as-root recommendations in the PEP? If so, yea, that's unrelated to the PEP entirely. I don't think there's anything between the lines about that.

Yes i would also love to understand that. Seems that `root' and Venv are different things but the error message mixes the two and it was mentioned earlier in the thread that PEP 668 is the solution that will solve it. I would really love to get rid of the error now i really need some guidance as i am a bit lost - should I continue using root? Or should I use Venv? Or both? Can I copy .Venv for between the users with the guarantees with shared libraries that '--user' gives above? How to activate Venv in Docker? I think I need to get those answers to be able to act on them (they are nicely bullet-pointed above and i really need your help to understand the answers.

You don't have to. venv/bin/pip does the right things.

Ok. Let me rephrase what I understand from that sentence.

Does it mean that just prepending the right .venv/bin to the PATH to make sure that Venv bin is first on the PATH is the right solution for docker? I used it in the past that i simply used 'python' from the Venv bin when I could not 'activate' the Venv. This is what seems to be suggested by the https://docs.python.org/3/library/venv.html#creating-virtual-environments which mentions that you can run venv python interpreter separately, but still the only 'official' way of activating the venv script is to source the 'activate' script so it's not entirely sure that we can count on this for 'pip' commands (it does seem reasonable to assume that but i just want to make sure I am not missing some edge cases when I make the decision to move whole Airflow image to use the venv.

Now from what I understand you tell me that just by having 'pip' from the 'venv/bin' path (so if i put the .venv first on the PATH) it will work and behave in exactly the same way as if i activated the environment? Can you confirm that please ? Do i understand it correctly? Is this behaviour implicit or documented somewhere?

Beyond that, I'm finding it difficult to follow what @potiuk https://github.com/potiuk said in his recent posts, which seem more emotionally charged than earlier ones; so... I'm going not respond to them. I would like to note that nothing super disruptive is changing anytime soon, so it's probably fine to come back to this discussion in like, a week or two / a month from now -- let's give people a bit of time to relax and tone down the conversation here.

I am really into 'how can i do it now' no more emotions or need to relax (i am actually writing it from holidays because I know it will get some time to clarify some of the questions of mine and I just want to get good tech ical answers before I come back so that I can work on it as soon as I back (hence some spellings and shortcuts as i am writing from mobile.

I would really appreciate if someone from the PIP team looks at the - plain technical - questions I asked and guides me there.

J.

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/10556#issuecomment-949562116, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAERMI4MOSXMQ2F6RIXZMNDUIFGSXANCNFSM5FRV5SUQ .

pfmoore commented 3 years ago

Purely technical answers here. I'm not going to discuss further as I think a "cooling off" period is good. But I will respond with (what I percieve as) facts, in the hope that they will be useful to you.

Do i understand correctly that the PEP 668 is going to change when it comes to containers

PEP 668 has not yet been accepted. It's not even been submitted for approval. The Discourse link is where you should go if you want to discuss what it says prior to approval. Once it's approved, it won't change further without a follow-up PEP.

i really need some guidance as i am a bit lost

That's advice, not technical questions, so I won't respond on that.

Does it mean that just prepending the right .venv/bin to the PATH to make sure that Venv bin is first on the PATH is the right solution for docker?

That's a choice you can make, there's no technical, yes/no answer.

Now from what I understand you tell me that just by having 'pip' from the 'venv/bin' path (so if i put the .venv first on the PATH) it will work and behave in exactly the same way as if i activated the environment?

The executables in the virtual environment's "bin" directory do not need the environment to be activated to work. You can run them directly using their absolute path, or add the bin directory to $PATH (that's all activating does anyway).

Is this behaviour implicit or documented somewhere?

It's standard virtual environment behaviour, I've no idea if it's documented anywhere TBH, but it's how they have always worked, and isn't going to change.

I hope this is useful.

potiuk commented 3 years ago

Also i started searching for and reading some of the previous issues - mixing venv with '--user' and related issues - fascinating read lots of information and some issues closed in favour of others, i understand it's a complex subject and especially mixing with --systm and others.

And I sympathise with pip maintainers - i honestly understand why you are promoting venv, as the ways how you can mix things and get confusing behaviours. I really would love to help in this direction - to make venv THE way. I am even happy to build some kind of recommendations to people who build images on how they can do it easily to handle even complex cases like airflow to promote it better and - at sone point becoming really the only supported way.

if you could help with getting the answers i need now I think i can simply be helpful with that. I have - unlike you - not full context in my head and knowledge from all the past (and numerous) discussions you had. Please do understand the ignorance of mine (for now - i am building my knowledge) but i am now convinced venv is the only way and happy to help with that.

pt., 22 paź 2021, 15:48 użytkownik Jarek Potiuk @.***> napisał:

I think (hopefully) it is not too emotional any more - I am really i b the state on how I can put the recommendations of always using Venv. I am still not ok withe the error message but i would like now to focus on how I can actually put 'always use venv' in practice. I'd love to focus on the technicalities without diving into emotions.

Side comment - i use correct spelling for pip now. Honestly thank you for expressing your emotions connected with using different spelling. Only because you expressed your emotions connected to that I had a chance to understand that and empathize with it. I would not have known otherwise.

We are all humans not robots - we do have emotions and i think it's ok to tell others how we feel so being emotional (in terms of expressing your emotions) is often much more important than hiding it because this might lead to misunderstanding. That's why I also not shy away from expressing my emotions usually.

make explicit recommendations for include the PEP's protections containers in the PEP. They won't be requirements, but they'll be a normalative recommendation. See the discuss.python.org thread for what exactly I've said.

Do i understand correctly that the PEP 668 is going to change when it comes to containers ? Can you please copy the right link ? The link you copied is just generic link to the 'discuss' site.

Or... are you talking about do-not-run-as-root recommendations in the PEP? If so, yea, that's unrelated to the PEP entirely. I don't think there's anything between the lines about that.

Yes i would also love to understand that. Seems that `root' and Venv are different things but the error message mixes the two and it was mentioned earlier in the thread that PEP 668 is the solution that will solve it. I would really love to get rid of the error now i really need some guidance as i am a bit lost - should I continue using root? Or should I use Venv? Or both? Can I copy .Venv for between the users with the guarantees with shared libraries that '--user' gives above? How to activate Venv in Docker? I think I need to get those answers to be able to act on them (they are nicely bullet-pointed above and i really need your help to understand the answers.

You don't have to. venv/bin/pip does the right things.

Ok. Let me rephrase what I understand from that sentence.

Does it mean that just prepending the right .venv/bin to the PATH to make sure that Venv bin is first on the PATH is the right solution for docker? I used it in the past that i simply used 'python' from the Venv bin when I could not 'activate' the Venv. This is what seems to be suggested by the https://docs.python.org/3/library/venv.html#creating-virtual-environments which mentions that you can run venv python interpreter separately, but still the only 'official' way of activating the venv script is to source the 'activate' script so it's not entirely sure that we can count on this for 'pip' commands (it does seem reasonable to assume that but i just want to make sure I am not missing some edge cases when I make the decision to move whole Airflow image to use the venv.

Now from what I understand you tell me that just by having 'pip' from the 'venv/bin' path (so if i put the .venv first on the PATH) it will work and behave in exactly the same way as if i activated the environment? Can you confirm that please ? Do i understand it correctly? Is this behaviour implicit or documented somewhere?

Beyond that, I'm finding it difficult to follow what @potiuk https://github.com/potiuk said in his recent posts, which seem more emotionally charged than earlier ones; so... I'm going not respond to them. I would like to note that nothing super disruptive is changing anytime soon, so it's probably fine to come back to this discussion in like, a week or two / a month from now -- let's give people a bit of time to relax and tone down the conversation here.

I am really into 'how can i do it now' no more emotions or need to relax (i am actually writing it from holidays because I know it will get some time to clarify some of the questions of mine and I just want to get good tech ical answers before I come back so that I can work on it as soon as I back (hence some spellings and shortcuts as i am writing from mobile.

I would really appreciate if someone from the PIP team looks at the - plain technical - questions I asked and guides me there.

J.

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/pypa/pip/issues/10556#issuecomment-949562116, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAERMI4MOSXMQ2F6RIXZMNDUIFGSXANCNFSM5FRV5SUQ .

potiuk commented 3 years ago

Thanks @pfmoore - we are definitely moving forward so thanks for taking time to answer some of my questions!

PEP 668 has not yet been accepted. It's not even been submitted for approval. The Discourse link is where you should go if you want to discuss what it says prior to approval. Once it's approved, it won't change further without a follow-up PEP.

That's a good news. I had the impression from earllier comments (pardon my ignorance here) that it is already agreend and being implemented. I think then it is even more important to clarify the issues of applying the .venv for tha case I am interested in mosty now to get my answers about (i.e. building the images with python dependencies with venv as first-class-citizen). I am super happy to bring whatever I find back to the discussion there and maybe even I will have a changed to influence the image/container part as I bring some experiences from converting Airlfow image to it.

That's advice, not technical questions, so I won't respond on that.

Let me just clarify then, because maybe that was not clear. the warning message is:

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

When I folow https://pip.pypa.io/warnings/venv I get redirected to https://docs.python.org/3/tutorial/venv.html - which is tutorial about virtualenv (which I know prety well). I looked at it again and there is no mentioning of root user. And I know for a fact that you can create a venv as root. So just wanted to make sure what is the recommendation (this is where I am confused). So simply speaking I would like to know whether the recommendaiton is:

1) I can runroot user with venv ( the first part of the message explicitly states root but "using virtual envronment" instead does not imply using differnt user so I am not sure) or 2) I should use different user than root and run it with venv

The reason why I want to know the answer is because I need to decide if I should create a separate user for the venv or whether I can run venv as root. I want to make sure that current and future versions of pip will not print the warning if I use root and venv (or at least I would like to know what is the current intention - even it might change in the future).

So does the recommendation recommend me 1) or 2) - because I am a bit lost ?

Does it mean that just prepending the right .venv/bin to the PATH to make sure that Venv bin is first on the PATH is the right solution for docker?

That's a choice you can make, there's no technical, yes/no answer.

Fair point. I think the next questions will allow me to derive the answer myself.

Now from what I understand you tell me that just by having 'pip' from the 'venv/bin' path (so if i put the .venv first on the PATH) it will work and behave in exactly the same way as if i activated the environment?

The executables in the virtual environment's "bin" directory do not need the environment to be activated to work. You can run them directly using their absolute path, or add the bin directory to $PATH (that's all activating does anyway). It's standard virtual environment behaviour, I've no idea if it's documented anywhere TBH, but it's how they have always worked, and isn't going to change.

Yep. This part (about executables not needing activation) is i think quite explici in the docs of venv (and as I wrote it's reasonavle to assume they work this way). But my questions are a bit deeper. The pip is not an ordinary binary. It does so much more than just run an interpreter. It can download source code and build shared libraries, copy those libraries and accompanying artifact files and generally "modify" the "installation environment". This is far more than usual binary or even "python" interpreter itself. My question is deeper than just "run" pip command. My question is about the resulting artifacts and whether I can rely on the ".venv" directory being cloneable (same as .local is when you use --user flag. As I explained my case. So far I was using --user in one image and copied the resutlting .local directory to another image to get way (600 MB instead of 850 MB) significant savings in size. My question is - if you use pip from the path on venv without activating the venv - will I be able to copy the resulting .venv directory to another image and get all the needed libraries and dependencies transferred smoothly. Do I need to worry about anything (should the .venv be in exactly the same location or when I change the user, should I modify something there?). I was doing it with --user and ,local dir and there I had guarantee that it will work. I have > 500 dependencies in Airflow, and the last thing I want after I release the image to my users is that some obscure package which I had no chance to test will behave differently when I copy the .venv.

Can I count on that behaviour?

I hope this is useful.

Definitely moving in the right direction - but I think some of my questions need some clarification so that I am convince that I can follow the .venv advice.

potiuk commented 3 years ago

(and apologies for capital PIP @pradyunsg - I already corrected. unfortunately auto-correct corrected it to capital :(

pfmoore commented 3 years ago

So does the recommendation recommend me 1) or 2)

This is where we start to get frustrated with each other, so I'll say one thing and then stop. I suggest that you do too - @pradyunsg is right that this issue needs people to take a breather.

You seem to want very specific and precise advice. But the whole point of the warning is not to use pip as root if you aren't confident that you understand the implications and can judge for yourself. IMO, that means that if you need to ask for explicit rulings, you shouldn't run pip as root at all. To be clear - that's my opinion, not a recommendation, nor a "statement by the pip maintainers", or anything else. And it's my interpretation of the sense of the warning, not a claim that the warning says precisely that. You can have your own opinion, certainly. We'll simply have to agree to differ in that case.

(I didn't write the warning text, so I'm allowed to have my own views on what it might mean without them being "official" in any way).

apologies for capital PIP

FWIW, there's no official spelling requirement for pip. Personally, I treat it as a normal word and capitalise at the start of sentences. I know this annoys @pradyunsg (and I'm sorry for that) but "must always be in lowercase even at the start of sentences" names annoy me, so I guess we can't both be happy 🙂

potiuk commented 3 years ago

Yes. I do expect precision indeed . This is why I chose '--user' flag initially because (from very precise and explicit docs) I've learned that I can do what I want and I had no surprises.

So when (and I am glad to follow it now) i got Warning that i should change it, i want to understand what are the consequences and ask for meaning of what i do not understand.

I try to paraphrase it and explain it with my own words how i understand it. More. I need to know the answers to serve my users as they will come to me with the same questions so i need to know the answers .

Is it really too much to ask to help with answering my questions coming from ambiguous messages?

I really do my best to precisely explain the use case and what i ask is help to understand what is not clear. I do it multiple times a.day when my users in Airflow.point out the ambiguities and my answer is usually 'great, can you please make PR to clarify' or 'yeah i will fix it, thanks for pointing out' or 'please take a look at the docs - it is already explained here' (usually because either the user or one of the committers fixed it last time when someone got confused).

What else can I do ? Do you really expect me (as a user of pip to follow all the discussions and understand all the nuances and take the risk of making my own judgements here where i simply ask if this and that will work this way or the other? How can I make sure that other who will come after me with the same problem will not again start the same discussion ? If you want to prevent similar discussions in the future, the.best this g you can do is to clarify what the meaning is - in docs, documents. Otherwise other people will again come here, open issues and continue annoying you.

I was full of hope (and we speak about the emotions again) that with my experience with docker and complexity of Airflow i can eventually help to clarify the recommendation and maybe even help with clarifying the PEP and bring some working examples how the recommendation can be put in practice.

I do not want to be begging for help.

But i encourage you and others to help me to answer my questions so that I can help you eventually.

It's that simple. Empathy, and understanding your users need is really important. And your users can help you with making your product better if only they get a bit of help. This is what I learned so far by last three years in open source.

Just that.

potiuk commented 3 years ago

For everyenes information here. I believe my worry and questions about the .venv behaving differently that .local was justified. Unlike the .local dir (with --user flag) you cannot just move it between the images to a different user (and especially to different arbitrary users) as easily as we could with .local dir.

I am working on migrating Airflow images to venv and the first thing I stumbled upon - it seems that If you want to to install it via venv and make it available to all users, you need to make sure that when you create the .venv in one image you need to make sure that the virtualenv is copied with exactly the same path.

If I copy /root created venv to to the airflow user I got:

/entrypoint: /home/airflow/.venv/bin/airflow: /root/.venv/bin/python3: bad interpreter: Permission denied

Aparently, the venv (unlike --user installation) stores the location of the python interpreter (and likely other binaries) in the .venv itself. This is solvable of course (we can make sure we always use the same path for venv rather than assume it is in home directory) but I think it's one of the things to clarify when we recommend image building solutions to move to venv. This is pretty standard practice to use multi-stage build when building python images so users might be surprised by this.

Luckily we have a pretty comprehensive set of tests that we can run on our images to see if they are still working as expected - but I find other things I will let you know (I also plan to join the PEP 668 discussion - @pradyunsg - could you please post the link to the discussion you mentioned before about container and PEP 668 where you advocated for the same kind of protections in containers? The link posted previously was a generic link to all discussions..

One other thing to note - seems that venv keeps the same code/shared libraries as .local - the size of the generated .venv folder is comparable with .local one:

du -h --summarize ~/.local
648M    /home/airflow/.local

du -h --summarize ~/.venv
684M    /home/airflow/.venv

potiuk commented 3 years ago

PR created: https://github.com/apache/airflow/pull/19189

In the absence of clear guidances, I decided to use root to create the venv and share it between the user in /.venv virtual env (not in a HOME directory of any of the users) to be able to copy the venv between images. I am running all CI tests in the CI image - and it should give a better answer if everything looks good after switching to .venv but preliminary tests of production image, show that it passes all the tests (after I fixed a few problems).

I've learned a thing or two how to approach "buildng optimal size images while using venv", so I am looking forward to sharing that in the PEP and maybe even provide a very precise guidance to anyone who wants to follow "use venv in images" and argue why this is good if someone wants to discuss it. I will chime in in the PEP discussions when I got the CI green and merge the changes, so tha I can talk using practical example and learning.

Click to expand! and see test results:

``` Checking command supports Feature: Checking the image without a command. It should return non-zero exit code.OK Feature: Checking 'airflow' command It should return non-zero exit code.OK Feature: Checking 'airflow version' command It should return zero exit code.OK Feature: Checking 'python --version' command It should return zero exit code.OK Feature: Checking 'bash --version' command It should return zero exit code.OK Verify prod image: ghcr.io/apache/airflow/main/prod/python3.6:latest Checking if Providers are installed Installed providers: package_name | description | version ==========================================+=================================================================================================+======== apache-airflow-providers-airbyte | Airbyte https://airbyte.io/ | 2.1.1 apache-airflow-providers-alibaba | Alibaba Cloud integration (including Alibaba Cloud https://www.alibabacloud.com//) | 1.0.0 apache-airflow-providers-amazon | Amazon integration (including Amazon Web Services (AWS) https://aws.amazon.com/) | 2.3.0 apache-airflow-providers-apache-beam | Apache Beam https://beam.apache.org/ | 3.0.1 apache-airflow-providers-apache-cassandra | Apache Cassandra http://cassandra.apache.org/ | 2.1.0 apache-airflow-providers-apache-drill | Apache Drill https://drill.apache.org/ | 1.0.1 apache-airflow-providers-apache-druid | Apache Druid https://druid.apache.org/ | 2.0.2 apache-airflow-providers-apache-hdfs | Hadoop Distributed File System (HDFS) https://hadoop.apache.org/docs/r1.2.1/hdfsdesign.html | 2.1.1 | and WebHDFS https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html | apache-airflow-providers-apache-hive | Apache Hive https://hive.apache.org/ | 2.0.2 apache-airflow-providers-apache-kylin | Apache Kylin https://kylin.apache.org/ | 2.0.1 apache-airflow-providers-apache-livy | Apache Livy https://livy.apache.org/ | 2.1.0 apache-airflow-providers-apache-pig | Apache Pig https://pig.apache.org/ | 2.0.1 apache-airflow-providers-apache-pinot | Apache Pinot https://pinot.apache.org/ | 2.0.1 apache-airflow-providers-apache-spark | Apache Spark https://spark.apache.org/ | 2.0.1 apache-airflow-providers-apache-sqoop | Apache Sqoop https://sqoop.apache.org/ | 2.0.2 apache-airflow-providers-asana | Asana https://app.asana.com/ | 1.1.0 apache-airflow-providers-celery | Celery http://www.celeryproject.org/ | 2.1.0 apache-airflow-providers-cloudant | IBM Cloudant https://www.ibm.com/cloud/cloudant | 2.0.1 apache-airflow-providers-cncf-kubernetes | Kubernetes https://kubernetes.io/ | 2.0.3 apache-airflow-providers-databricks | Databricks https://databricks.com/ | 2.0.2 apache-airflow-providers-datadog | Datadog https://www.datadoghq.com/ | 2.0.1 apache-airflow-providers-dingding | Dingding https://oapi.dingtalk.com/ | 2.0.1 apache-airflow-providers-discord | Discord https://discordapp.com/ | 2.0.1 apache-airflow-providers-docker | Docker https://docs.docker.com/install/ | 2.2.0 apache-airflow-providers-elasticsearch | Elasticsearch https://www.elastic.co/elasticsearch | 2.0.3 apache-airflow-providers-exasol | Exasol https://docs.exasol.com/home.htm | 2.0.1 apache-airflow-providers-facebook | Facebook Ads http://business.facebook.com/ | 2.0.1 apache-airflow-providers-ftp | File Transfer Protocol (FTP) https://tools.ietf.org/html/rfc114 | 2.0.1 apache-airflow-providers-google | Google services including: | 6.0.0 | | | - Google Ads https://ads.google.com/ | | - Google Cloud (GCP) https://cloud.google.com/ | | - Google Firebase https://firebase.google.com/ | | - Google LevelDB https://github.com/google/leveldb/ | | - Google Marketing Platform https://marketingplatform.google.com/ | | - Google Workspace https://workspace.google.pl/ (formerly Google Suite) | apache-airflow-providers-grpc | gRPC https://grpc.io/ | 2.0.1 apache-airflow-providers-hashicorp | Hashicorp including Hashicorp Vault https://www.vaultproject.io/ | 2.1.1 apache-airflow-providers-http | Hypertext Transfer Protocol (HTTP) https://www.w3.org/Protocols/ | 2.0.1 apache-airflow-providers-imap | Internet Message Access Protocol (IMAP) https://tools.ietf.org/html/rfc3501 | 2.0.1 apache-airflow-providers-influxdb | InfluxDB https://www.influxdata.com/ | 1.0.0 apache-airflow-providers-jdbc | Java Database Connectivity (JDBC) https://docs.oracle.com/javase/8/docs/technotes/guides/jdbc/ | 2.0.1 apache-airflow-providers-jenkins | Jenkins https://jenkins.io/ | 2.0.2 apache-airflow-providers-jira | Atlassian Jira https://www.atlassian.com/ | 2.0.1 apache-airflow-providers-microsoft-azure | Microsoft Azure https://azure.microsoft.com/ | 3.2.0 apache-airflow-providers-microsoft-mssql | Microsoft SQL Server (MSSQL) https://www.microsoft.com/en-us/sql-server/sql-server-downloads | 2.0.1 apache-airflow-providers-microsoft-psrp | PowerShell Remoting Protocol (PSRP) | 1.0.1 | https://docs.microsoft.com/en-us/openspecs/windowsprotocols/ms-psrp/ | apache-airflow-providers-microsoft-winrm | Windows Remote Management (WinRM) https://docs.microsoft.com/en-us/windows/win32/winrm/portal | 2.0.1 apache-airflow-providers-mongo | MongoDB https://www.mongodb.com/what-is-mongodb | 2.1.0 apache-airflow-providers-mysql | MySQL https://www.mysql.com/products/ | 2.1.1 apache-airflow-providers-neo4j | Neo4j https://neo4j.com/ | 2.0.2 apache-airflow-providers-odbc | ODBC https://github.com/mkleehammer/pyodbc/wiki | 2.0.1 apache-airflow-providers-openfaas | OpenFaaS https://www.openfaas.com/ | 2.0.0 apache-airflow-providers-opsgenie | Opsgenie https://www.opsgenie.com/ | 2.0.1 apache-airflow-providers-oracle | Oracle https://www.oracle.com/en/database/ | 2.0.1 apache-airflow-providers-pagerduty | Pagerduty https://www.pagerduty.com/ | 2.0.1 apache-airflow-providers-papermill | Papermill https://github.com/nteract/papermill | 2.1.0 apache-airflow-providers-plexus | Plexus https://plexus.corescientific.com/ | 2.0.1 apache-airflow-providers-postgres | PostgreSQL https://www.postgresql.org/ | 2.3.0 apache-airflow-providers-presto | Presto https://prestodb.github.io/ | 2.0.1 apache-airflow-providers-qubole | Qubole https://www.qubole.com/ | 2.0.1 apache-airflow-providers-redis | Redis https://redis.io/ | 2.0.1 apache-airflow-providers-salesforce | Salesforce https://www.salesforce.com/ | 3.2.0 apache-airflow-providers-samba | Samba https://www.samba.org/ | 3.0.0 apache-airflow-providers-segment | Segment https://segment.com/ | 2.0.1 apache-airflow-providers-sendgrid | Sendgrid https://sendgrid.com/ | 2.0.1 apache-airflow-providers-sftp | SSH File Transfer Protocol (SFTP) https://tools.ietf.org/wg/secsh/draft-ietf-secsh-filexfer/ | 2.1.1 apache-airflow-providers-singularity | Singularity https://sylabs.io/guides/latest/user-guide/ | 2.0.1 apache-airflow-providers-slack | Slack https://slack.com/ | 4.1.0 apache-airflow-providers-snowflake | Snowflake https://www.snowflake.com/ | 2.2.0 apache-airflow-providers-sqlite | SQLite https://www.sqlite.org/ | 2.0.1 apache-airflow-providers-ssh | Secure Shell (SSH) https://tools.ietf.org/html/rfc4251 | 2.2.0 apache-airflow-providers-tableau | Tableau https://www.tableau.com/ | 2.1.1 apache-airflow-providers-telegram | Telegram https://telegram.org/ | 2.0.1 apache-airflow-providers-trino | Trino https://trino.io/ | 2.0.1 apache-airflow-providers-vertica | Vertica https://www.vertica.com/ | 2.0.1 apache-airflow-providers-yandex | Yandex including Yandex.Cloud https://cloud.yandex.com/ | 2.1.0 apache-airflow-providers-zendesk | Zendesk https://www.zendesk.com/ | 2.0.1 Verifying if provider amazon installed: OK Verifying if provider celery installed: OK Verifying if provider cncf.kubernetes installed: OK Verifying if provider docker installed: OK Verifying if provider elasticsearch installed: OK Verifying if provider ftp installed: OK Verifying if provider grpc installed: OK Verifying if provider hashicorp installed: OK Verifying if provider http installed: OK Verifying if provider imap installed: OK Verifying if provider google installed: OK Verifying if provider microsoft.azure installed: OK Verifying if provider mysql installed: OK Verifying if provider postgres installed: OK Verifying if provider redis installed: OK Verifying if provider sendgrid installed: OK Verifying if provider sqlite installed: OK Verifying if provider sftp installed: OK Verifying if provider slack installed: OK Verifying if provider sqlite installed: OK Verifying if provider ssh installed: OK OK. All expected providers installed! Verify prod image features: ghcr.io/apache/airflow/main/prod/python3.6:latest Feature: Import: async OK Feature: Import: amazon OK Feature: Import: celery OK Feature: Import: cncf.kubernetes OK Feature: Import: docker OK Feature: Import: dask OK Feature: Import: elasticsearch OK Feature: Import: grpc OK Feature: Import: hashicorp OK Feature: Import: ldap OK Feature: Import google: OpenSSL OK Feature: Import google: google.ads OK Feature: Import google: googleapiclient OK Feature: Import google: google.auth OK Feature: Import google: google_auth_httplib2 OK Feature: Import google: google.cloud.automl OK Feature: Import google: google.cloud.bigquery_datatransfer OK Feature: Import google: google.cloud.bigtable OK Feature: Import google: google.cloud.container OK Feature: Import google: google.cloud.datacatalog OK Feature: Import google: google.cloud.dataproc OK Feature: Import google: google.cloud.dlp OK Feature: Import google: google.cloud.kms OK Feature: Import google: google.cloud.language OK Feature: Import google: google.cloud.logging OK Feature: Import google: google.cloud.memcache OK Feature: Import google: google.cloud.monitoring OK Feature: Import google: google.cloud.oslogin OK Feature: Import google: google.cloud.pubsub OK Feature: Import google: google.cloud.redis OK Feature: Import google: google.cloud.secretmanager OK Feature: Import google: google.cloud.spanner OK Feature: Import google: google.cloud.speech OK Feature: Import google: google.cloud.storage OK Feature: Import google: google.cloud.tasks OK Feature: Import google: google.cloud.texttospeech OK Feature: Import google: google.cloud.translate OK Feature: Import google: google.cloud.videointelligence OK Feature: Import google: google.cloud.vision OK Feature: Import azure: azure.batch OK Feature: Import azure: azure.cosmos OK Feature: Import azure: azure.datalake.store OK Feature: Import azure: azure.identity OK Feature: Import azure: azure.keyvault OK Feature: Import azure: azure.kusto.data OK Feature: Import azure: azure.mgmt.containerinstance OK Feature: Import azure: azure.mgmt.datalake.store OK Feature: Import azure: azure.mgmt.resource OK Feature: Import azure: azure.storage OK Feature: Import: mysql OK Feature: Import: postgres OK Feature: Import: redis OK Feature: Import: sendgrid OK Feature: Import: sftp/ssh OK Feature: Import: slack OK Feature: Import: statsd OK Feature: Import: virtualenv OK Feature: Import: pyodbc OK Checking if Airflow dependencies are non-conflicting in ghcr.io/apache/airflow/main/prod/python3.6:latest image. No broken requirements found. OK. The ghcr.io/apache/airflow/main/prod/python3.6:latest image dependencies are consistent. Checking if the image can be run as root. Checking airflow as root OK Checking root container with custom PYTHONPATH OK rm: cannot remove '/tmp/tmp.hY2YNrBcYB/__pycache__/awesome.cpython-36.pyc': Permission denied Verify prod image has dist folder (compiled www assets): ghcr.io/apache/airflow/main/prod/python3.6:latest Feature: Dist folder OK OK. The ghcr.io/apache/airflow/main/prod/python3.6:latest features are all OK. ```

potiuk commented 3 years ago

Another (agin as I anticipated) problem (and might be good to document for anyone who hits similar problems):

Unfortunately, the venv solution is different from the --user when it comes to running python scripts via sudo in the image.

The error here https://github.com/apache/airflow/runs/3985552602?check_suite_focus=true#step:7:52 is caused by this.

Reason: Unfortunatley sudo does not preserve the PATH (even if it is run with -E ) which means that the trick with setting th PATH is not working for sudo commands (effectively what happens you run the "bare" python environment when you run sudo command (unlike the --user which works across sudo). I am looking for a reasonable way to always activate the venv even if you are using sudo command.

Any suggestions appreciated.

potiuk commented 3 years ago

Ok. Looks like the right combination is :

adding /etc/profile.d/setup_venv.sh - with activating the venv (this will work nicely for sudo -i and basically for all interactive logins
modifying securePath in /etc/sudoers to add the venv path

Then the venv should work same as -user in all cases (at least in case)

Note allso that the other test failures in Airlow (for example this https://github.com/apache/airflow/runs/3985552779?check_suite_focus=true#step:6:7149 ) is also caused by switching to venv. We are using Python virtualenv operator that will create a new virtualenv dynamically if you want to dynamically add new dependencies. The problem there is (likely) caused by loosing the original venv packages when we are dynamically creating a new one. That was also preserved by the `--user flag and we could rely on it. I am not yet sure how to solve that one but I will look at it next.

Again - any suggestions appreciated.

pypa / pip