python / cpython

The Python programming language
https://www.python.org/
Other
61.16k stars 29.52k forks source link

Deprecate and remove code execution in pth files #78125

Open warsaw opened 6 years ago

warsaw commented 6 years ago
BPO 33944
Nosy @mhammond, @warsaw, @brettcannon, @terryjreedy, @jaraco, @ncoghlan, @pitrou, @ericvsmith, @tiran, @nedbat, @aroberge, @methane, @ericsnowcurrently, @takluyver, @zooba, @matrixise, @vedgar, @native-api, @yan12125, @asottile, @ethanhs, @csabella, @miss-islington, @chrisjbillington, @qix-
PRs
  • python/cpython#10131
  • python/cpython#12107
  • python/cpython#12110
  • python/cpython#15942
  • Dependencies
  • bpo-14803: Add feature to allow code execution prior to main invocation
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', 'type-feature', 'library'] title = 'Deprecate and remove code execution in pth files' updated_at = user = 'https://github.com/warsaw' ``` bugs.python.org fields: ```python activity = actor = 'lkollar' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'barry' dependencies = ['14803'] files = [] hgrepos = [] issue_num = 33944 keywords = ['patch'] message_count = 120.0 messages = ['320246', '320249', '320253', '320266', '320277', '320279', '320283', '320284', '320286', '320287', '320292', '320293', '320342', '320386', '320393', '320724', '320754', '320850', '320997', '321005', '321026', '321125', '321134', '321340', '328488', '328564', '329607', '329764', '329802', '330115', '333235', '333536', '333567', '333568', '333569', '333572', '333591', '333592', '333613', '333637', '333638', '333639', '333640', '333642', '333644', '333645', '333698', '333699', '333705', '333706', '333716', '333997', '334199', '335774', '335926', '336351', '336662', '336705', '336709', '336710', '336711', '336714', '336716', '336721', '336722', '336725', '336726', '336809', '336853', '336856', '336860', '336863', '336875', '336882', '336939', '336944', '336961', '336970', '336983', '336984', '336992', '337064', '337351', '337353', '337354', '337365', '337368', '337370', '337396', '337399', '337406', '337408', '337409', '337410', '337414', '337417', '337418', '337421', '337422', '337424', '337426', '337427', '337430', '337434', '337437', '337438', '337439', '337446', '337920', '337954', '350625', '351861', '351872', '358909', '358915', '358953', '368712', '368732', '371334', '384148'] nosy_count = 31.0 nosy_names = ['mhammond', 'barry', 'brett.cannon', 'terry.reedy', 'jaraco', 'ncoghlan', 'pitrou', 'eric.smith', 'christian.heimes', 'nedbat', 'aroberge', 'ionelmc', 'methane', 'SilentGhost', '__Vano', 'eric.snow', 'takluyver', 'steve.dower', 'matrixise', 'veky', 'Ivan.Pozdeev', 'yan12125', 'Anthony Sottile', 'Michel Desmoulin', 'ethan smith', 'cheryl.sabella', 'lkollar', 'miss-islington', 'Chris Billington', 'Peter L3', 'qix-'] pr_nums = ['10131', '12107', '12110', '15942'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue33944' versions = ['Python 3.8'] ```

    warsaw commented 6 years ago

    pth files are evil. They are very difficult to debug because they're processed too early. They usually contain globs of inscrutable code. Exceptions in pth files can get swallowed in some cases. They are loaded in indeterminate order.

    They are also unnecessary to support namespace packages in Python 3 (ignoring straddling code).

    Let's start the process for removing them.

    1. Deprecate pth files in Python 3.8 and turn them off with the -3 option.

    2. Kill off pth file support once Python 2 is EOL'd.

    tiran commented 6 years ago

    +1

    ericvsmith commented 6 years ago

    Also +1.

    cd34197f-d4a4-4e30-9fbe-454f267f097e commented 6 years ago

    I'm generally in favour of getting rid of .pth files. But I did accept a PR adding support for them in Flit to act as a substitute for symlinks on Windows, to achieve something like a 'development install'. I'm not sure what the alternative is if they go away.

    brettcannon commented 6 years ago

    Windows has symlinks now I believe, you just have to turn them on.

    And I would say there is no need for alternative. If a package needs to do something funky they can do it in their __init__.py file. Otherwise if I don't import a package it shouldn't get to do anything crazy through a .pth file.

    cd34197f-d4a4-4e30-9fbe-454f267f097e commented 6 years ago

    I don't want to use the execution features of .pth files, just their original functionality of adding extra directories to sys.path. I'd be very happy to see the arbitrary code execution 'feature' of .pth files go away.

    Windows supports symlinks, but the last I heard was that creating them requires some obscure permission bit. It seems to be awkward enough that Windows users aren't happy with the "just use symlinks" approach, which was what I was originally trying.

    ericvsmith commented 6 years ago

    My understanding about symlinks on Windows is that they require a permission ("Create symbolic links"), that normal users by default do not have. I'm not sure if this has changed recently.

    ae9affa2-721f-411e-8014-69189ad9b6f2 commented 6 years ago

    I am in favor of symlinks no longer being able to execute arbitrary code, however, I do think having them add to the path cannot be killed in two releases. Here is why:

    1. Windows support for symlinks is still not automatic. In the creators update of Windows 10 (released March 2017), CreateSymbolicLink added a dwflag SYMBOLIC_LINK_FLAG_ALLOW_UNPRIVILEGED_CREATE. This requires the user to be in developer mode to work. CPython currently doesn't use this flag. (I will open an issue to add that in a moment). I worry that giving people little time to update will be troublesome.

    2. All editable installs everywhere (AFAIK) and setuptools eggs (still somewhat common) use easy-install.pth to list where they are. I think breaking editable installs is a bad idea, as there is no clear solution for this. Also setuptools has a fair amount of work to do before it can replace egg installs.

    So I think removing adding to the path will require much more thought and break a lot more code than removing arbitrary code execution.

    brettcannon commented 6 years ago

    My only answer to Ethan is "don't use eggs". :)

    warsaw commented 6 years ago

    There are lots of problems with pth files, although arbitrary code execution is probably the most egregious. They are also notoriously difficult to debug, and happen before any control is given to user code. They certainly are unnecessary for namespace packages, which I think they currently get used for often in Py 2/3 straddling code.

    Maybe it will be okay to just fall back to sys.path extension, but I'd like to have a better understanding of exactly what the use cases are (in a pure Python 3 world), and we have to address the other problems about discovery and debuggability.

    ncoghlan commented 6 years ago

    Strong -1 without a functional replacement that provides comparable LD_PRELOAD capabilities (it also needs a full PEP that analyses all of the ways that setuptools and other packaging utilities use these files, such as for the implementation of "develop" mode, and the processing of ".lnk" mode).

    This change also needs to account for the Windows-only "._pth" files that override the path completely.

    The main discussion list for such a PEP should be distutils-sig, *not* python-ideas or import-sig (since distutils-sig is where we're more likely to find folks that are actually relying on the feature, and hence have a clearer idea of what will need to change to maintain a comparable level of ecosystem level capability).

    https://bugs.python.org/issue14803 is also related, as pth file processing should at least be delayed to run later than it does currently, and because "run code at startup" is one of the capabilities that would need replacing.

    ncoghlan commented 6 years ago

    Concrete use case for the original path extension capability: "pew add", which chains virtual environments together (allowing shared environments with a common default dependency set, and then additional per-application dependencies)

    ncoghlan commented 6 years ago

    Brett pointed out that may initial reaction above came across as quite blunt and demanding, so attempting to phrase that more clearly as a user experience consideration:

    It may be tempting to view this as purely a clean-up of the import system implementation, removing a quirky and error prone construct for the sake of improved maintainability of both the import system itself, and the maintainability of end user installations.

    My request (wearing my "BDFL-delegate for packaging interoperability standards" hat) is that proponents of the change resist the temptation to view the problem that way :)

    Path files are used extensively across the Python packaging ecosystem to implement additional environment management features beyond those provided natively by interpreter implementations, and while we've added native equivalents for some of them (namespace packages, virtual environments), we're far from having added support for all of them (dynamic package version selection, virtual environment chaining, editable package installs that still publish correct PEP-376 package metadata, etc).

    This means that any changes in this area pose significant backwards compatibility risks, and need to be approached carefully, and cautiously, with a strong emphasis on surveying real world code and seeing how the feature is currently being used.

    Or, alternatively, the idea can be broken up into smaller, lower impact changes that still help to address the import system and end user environment maintainability issues, but don't involve breaking backwards compatibility.

    (For an example of the latter: if "python -m site --list-pth-files" printed a list of all of the pth files and "python -m site --dump-pth-files" listed both the files and their contents, then environment debuggability would improve significantly without any compatibility impacts whatsoever)

    pitrou commented 6 years ago

    I would also add that editable installs should not break in the process. They are important.

    warsaw commented 6 years ago

    On Jun 23, 2018, at 18:56, Nick Coghlan \report@bugs.python.org\ wrote:

    My request (wearing my "BDFL-delegate for packaging interoperability standards" hat) is that proponents of the change resist the temptation to view the problem that way :)

    Path files are used extensively across the Python packaging ecosystem to implement additional environment management features beyond those provided natively by interpreter implementations, and while we've added native equivalents for some of them (namespace packages, virtual environments), we're far from having added support for all of them (dynamic package version selection, virtual environment chaining, editable package installs that still publish correct PEP-376 package metadata, etc).

    Still, I firmly believe they’re a wart being abused for purposes they weren’t really intended for. It’s a trick of implementation that lines beginning with import are exec’d. That being said…

    Or, alternatively, the idea can be broken up into smaller, lower impact changes that still help to address the import system and end user environment maintainability issues, but don't involve breaking backwards compatibility.

    +1 on working on *much* better debuggability and discoverability for .pth files first, and then consider their eventual deprecation, replacement, and/or removal.

    eb193824-a002-4e53-a73b-be75738ef3f2 commented 6 years ago

    I *think* we need to ask maintainers of packages who use .pth -- at least, Mark Hammond (pywin32) -- to find out the impact and if everything can be done with other means.

    AFAICS it at least allows pywin32 to have many top-level modules without cluttering `site-packages'.

    pywin32 e.g. also copies some files to %windir%\system32 for some reason. And last time I checked, distutils had no functionality that involved symlinks, regardless of the OS.

    ncoghlan commented 6 years ago

    I think we also need to clearly separate two distinct aspects of .pth files:

    1. "import \<module>; \<arbitrary code execution goes here>" lines \<--- Kill it with fire
    2. "\<add this directory to sys.path>" lines \<--- This is fine and good and perfectly sensible

    It's point 2 that powers things like "pew add", and I don't see any particularly compelling reason to get rid of it.

    The "arbitrary code invocation for every single Python execution using that environment" aspect, on the other hand, is mostly a PITA, and used as a workaround for other features being missing (e.g. the PYTHONRUNFIRST proposal in https://bugs.python.org/issue14803).

    mhammond commented 6 years ago

    pywin32, up until recently, just listed 3 directories in its .pth file - these were for directories which pre-dated packages and were never converted. Eg, "import win32api" actually loads win32api.pyd from the "site-packages/win32" directory.

    Earlier this year, via https://github.com/mhammond/pywin32/issues/1151, I also added the line:

    import os;os.environ["PATH"]+=(';'+os.path.join(sitedir,"pywin32_system32"))

    which is to support pywin32 being installed from wheels - this is due to pywin32 shipping with various shared DLLs which implement many pywin32 types - eg, pywintypesXX.dll is used by (almost) every single .pyd shipped with pywin32, and disutils doesn't offer any way of copying files as part of a post-install script or any other way of ensuring these .dll files are on the PATH or otherwise next to pythonXX.dll/.exe

    I'm happy to replace both of these with alternatives when they exist.

    warsaw commented 6 years ago

    I think we'll clearly need a PEP for this clean up. I'd like to see a separate "preload" feature as well, especially one that is deterministic and happens before site.py. Not sure if that should be one PEP or two.

    ericsnowcurrently commented 6 years ago

    @barry, make sure you take a look at https://bugs.python.org/issue14803.

    ncoghlan commented 6 years ago

    To avoid confusing the discussions, two PEPs is likely a better option:

    1. Designing and implementing a dedicated preload mechanism
    2. Adjusting the way pth file handling works, including deprecating and removing the "pth arbitrary file execution" trick (depends on the first one as the forward compatible migration path for legitimate code preloading use cases)
    terryjreedy commented 6 years ago

    This issue, as stated, looks like a severe regression to me.

    In each of my python installs, Lib/site-packages has a file called 'python.pth' containing 'F:/Python'. This is not a glob of inscrutable code. It is not even Python code. Just a path. Is this issue about something else also called a 'pth file'?

    F:/Python latter is a package development directory on my supplementary hard drive. When I first install a new version of Python (early alpha), I copy this tiny file. Voila! The packages within /Python are 'installed' for the new version without making copies. Editing a file edits it for all 'installs'. Deleting the directory for an old and no longer needed version does not delete any of my files.

    Import in files within F:/Python/pack act as if pack were installed in the site package for the version of python running the file. I can easily run anything in Command Prompt with 'py -x.y -m pack.file'. I can easily rerun with a different version by hitting up arrow and changing x.y. Command Prompt's current working directory does not matter.

    I think this is one of Python's most under-appreciated features. I am rather sure there is no way to so easily get the same effect. Abuse of a great feature is not a good reason to delete it completely.

    eb193824-a002-4e53-a73b-be75738ef3f2 commented 6 years ago

    They are very difficult to debug because they're processed too early.

    .pth's are processed by site.py, so no more difficult than site/sitecustomize. You can e.g. run `site.addpackage(\<dir>,\<file>,None)' to debug the logic.

    They usually contain globs of inscrutable code.

    An ability to contain code is there for a reason: to allow a module do something more intelligent than adding hardcoded paths if needed (e.g. pywin32 adds a subdir with .dll dependencies to PATH).

    A chunk of code is limited to a single line -- a conscious limitation to deter misuse 'cuz search path setup code is presumed to be small.

    If someone needs something other than path setup, they should do it upon the module's import instead. If they insist on misusing the feature, Python's design does what it's supposed to do in such cases: "make right things easy, make wrong things hard".

    If there's a valid reason to allow larger code chunks, we can introduce a different syntax -- e.g. something like bash here-documents.

    Exceptions in pth files can get swallowed in some cases.

    If this happens, it's a bug. A line from .pth is executed with "exec line", any exceptions should propagate up the stack as usual.

    They are loaded in indeterminate order.

    Present a use case justifying a specific order. I can see a probable use case: a package needs to do something using its dependencies, so any .pth for the dependencies should run before the one for the package. But I can't see why that package can't do this upon its import instead (saves unnecessary work if the user won't be using that package in that session, too). The only valid case I can see is if the package is using some 3rd-party import system (e.g. a .7z archive or some module repository) that needs to be loaded first for its search path to make sense.

    warsaw commented 6 years ago

    On Jul 5, 2018, at 14:23, Ivan Pozdeev \report@bugs.python.org\ wrote:

    Ivan Pozdeev \ivan_pozdeev@mail.ru\ added the comment:

    > They are very difficult to debug because they're processed too early.

    .pth's are processed by site.py, so no more difficult than site/sitecustomize. You can e.g. run `site.addpackage(\<dir>,\<file>,None)' to debug the logic.

    Not really. By the time you have access to a REPL to run that, site.py has already run, so you already have an unclean environment. Running with -S really isn’t feasible either since that’s often impossible (e.g. in a zip app like shiv or pex), or that leaves you with a broken environment so you can’t get to a usable REPL. What you often have to do is actually modify Python to put a breakpoint in site.py to see what’s actually happening. Yuck.

    > They usually contain globs of inscrutable code.

    An ability to contain code is there for a reason: to allow a module do something more intelligent than adding hardcoded paths if needed (e.g. pywin32 adds a subdir with .dll dependencies to PATH).

    A chunk of code is limited to a single line -- a conscious limitation to deter misuse 'cuz search path setup code is presumed to be small.

    Trust me, once you can execute arbitrary code in .pth files, you’re lost. And packages *do* execute arbitrary code that is very difficult to debug. And yes, those complex lines are both inscrutable and non-standard.

    If someone needs something other than path setup, they should do it upon the module's import instead.

    Except they often don’t.

    If they insist on misusing the feature, Python's design does what it's supposed to do in such cases: "make right things easy, make wrong things hard”.

    The problem comes when some random module you are including in your application does something weird in their .pth files that breaks assumptions *other* libraries or code is making. It’s not as uncommon as it might seem.

    If there's a valid reason to allow larger code chunks, we can introduce a different syntax -- e.g. something like bash here-documents.

    The size of the code chunks isn’t the only issue. Running arbitrary code in a .pth file has all kinds of negative consequences. It’s basically code that happens at import time, with all the problems that happen with that anti-pattern.

    > Exceptions in pth files can get swallowed in some cases.

    If this happens, it's a bug. A line from .pth is executed with "exec line", any exceptions should propagate up the stack as usual.

    > They are loaded in indeterminate order.

    Present a use case justifying a specific order.

    Interdependent namespace packages. If they get loaded in the wrong order, they can mess up __path__ settings, causing other namespace package portions to be un-importable. Yes, this does happen!

    379dc349-3a10-424f-b9d2-a0104f092359 commented 5 years ago

    There are a number of packages that can "self-import" into any Python process depending on the presence of an environment variable, by installing a pth file that contains something like import os; __import__("thepkg") if os.environ.get("THEENVVAR") else None. Examples include colorization of logging output (https://coloredlogs.readthedocs.io/en/latest/api.html#environment-variables) or installation of a trace function (https://pypi.org/project/hunter/#environment-variable-activation).

    If the pth mechanism goes away, a preload system should definitely be present to provide a replacement; it should again support multiple packages each installing their own hook.

    eb193824-a002-4e53-a73b-be75738ef3f2 commented 5 years ago

    The primary motivation behind the suggestion seems to be the fact that the feature is abused.

    However, the documentation has no info whatsoever on what is the intended use -- thus what constitutes abuse. Without that, the accusations are kind of baseless -- how can we blame package authors for having to figure it out for themselves?

    I've made a PR with the corresponding note. Since the discussion has revealed a number of valid use cases for the feature for which there are no adequate alternatives currently, I hope it will diminish the discontent and be grounds to incite package authors to remove unnecessary logic from there.

    eb193824-a002-4e53-a73b-be75738ef3f2 commented 5 years ago

    @barry

    Interdependent namespace packages. If they get loaded in the wrong order, they can mess up __path__ settings

    Actually, when writing the PR, I had a revelation how this could be implemented. Via an import hook that would work like a union FS!

    In its .pth file, each such package will import the hook's module (which will cause the hook to be installed on the first import) and "register" its namespaces and/or dependencies with it. The hook will then calculate the required load order and enforce it upon import of any of the registered namespaces.

    warsaw commented 5 years ago

    On Nov 10, 2018, at 04:50, Ivan Pozdeev \report@bugs.python.org\ wrote:

    In its .pth file, each such package will import the hook's module (which will cause the hook to be installed on the first import) and "register" its namespaces and/or dependencies with it. The hook will then calculate the required load order and enforce it upon import of any of the registered namespaces.

    I’m a little concerned about this approach because it means random third party modules can affect the global environment for your application, without knowing it. Since the hook installation happens at import time, and just depending on a library that has such a .pth file will install it, the end application will not have control over its global state. It’s not possible to know whether this is a serious problem, but in the past, global state changes are problematic when applications do not have control over it.

    cbf13ede-eda8-4246-abee-98732ce73413 commented 5 years ago

    I’m a little concerned about this approach because it means random third party modules can affect the global environment for your application, without knowing it. Since the hook installation happens at import time, and just depending on a library that has such a .pth file will install it, the end application will not have control over its global state. But "affecting the global environment for your application" is exactly what is intended here. You want multiple packages to all load their code into the same namespaces (aka module objects), thus of course potentially affecting/overriding each other's functionality. That's what you get when you have plugins -- a badly-written/incompatible plugin can and will break your app.

    It doesn't have to "just depend on a library that has such a .pth file", it's up to the import hook's implementation. I just gave as example the simplest solution that requires zero effort on the main package maintainer's part.

    E.g. you can only allow adding a new submodule by default, or require the "parent" package to "allow" insertions into itself, or move registration into the parent's configuration file (so the user needs to enable the plugin manually), or provide some more granular code injection techniques like e.g. event handler lists that certain plugins' functions will be added into. All that matters here is that the hook is going to automagically assemble the resulting namespaces from parts upon import.

    Finally, Python applications don't have full control over their global state anyway. Any module can monkey-patch or override any other module via a variety of means. So, this risk is not something new or unexpected.

    jaraco commented 5 years ago

    Regarding other uses of .pth files, the project future-fstrings relies on .pth files to enable its at-startup behavior.

    I'm also +1 to remove .pth files, but I also believe it's not viable today due to development installs of pkg_resource-style namespace packages.

    I haven't read the full history of this issue, but plan to get caught up on it soon.

    e9052a66-9d25-42b8-b701-20e674176c81 commented 5 years ago

    I develop analysis software for physics research, in which the user analyses their data using Python that they write themselves (my application functions as a kind of scheduler for when the analysis scripts should run and with what input). This software has a concept of 'the user's modules', which the user can import from anywhere. When the application is installed, it installs a .pth file to add this 'userlib' folder to the Python path. This way the user can maintain importable modules that they re-use in their analysis without having to put them on PyPI or anything like that (which would be impractical since they are often being hacked on and don't have anything resembling a release cycle). It is important that these modules aren't just available from within the environment my application provides, as that is a bit too rigid - the user should be able to use the normal Python REPL or IPython or whatever to develop and test their code when the 'scheduler' is not in control of running it.

    I'm not sure what I would do instead if .pth files went away. Modifying PYTHONPATH is messy since it applies to all python versions, whereas .pth files are nicely specific only to the one Python installation. sitecustomize.py is messy because if it already exists I need to programmatically modify it to add or remove my changes (and contend with the fact that other packages may be doing the same), whereas a .pth file is nicely separate.

    I didn't even know about the arbitrary code execution capabilities of .pth files and don't really care, but keeping the ability to add directories to the Python path would be nice, as the alternatives for doing this are unappealing (and for my application, putting the code the user is hacking on daily deep inside a Conda environment folder hierarchy is unappealing too).

    ncoghlan commented 5 years ago

    To make a potentially viable concrete proposal here, I think a reasonable first step would be to change the ".pth" file processing code in site.py to emit PendingDeprecationWarning for the 'if line.startswith(("import ", "import\t")):' branch.

    In addition to helping to determine the scope of the compatibility break being discussed here, such a warning would also be usable as a debugging tool.

    I'd also suggest updating "python -m site" to list any pth files that it finds, and categorise them as simple sys.path additions (which are generally fine), and arbitrary code (which can be problematic).

    warsaw commented 5 years ago

    To make a potentially viable concrete proposal here, I think a reasonable first step would be to change the ".pth" file processing code in site.py to emit PendingDeprecationWarning for the 'if line.startswith(("import ", "import\t")):' branch.

    PendingDeprecationWarning because you don’t think we can remove this functionality in 3.9?

    In addition to helping to determine the scope of the compatibility break being discussed here, such a warning would also be usable as a debugging tool.

    I'd also suggest updating "python -m site" to list any pth files that it finds, and categorise them as simple sys.path additions (which are generally fine), and arbitrary code (which can be problematic).

    Great idea, +1

    ncoghlan commented 5 years ago

    I'm suggesting PendingDeprecationWarning because we can't *actually* deprecate anything until we provide a more transparent alternative that offers comparable functionality, and I haven't seen a credible proposal for a replacement yet.

    So using PDW would truthfully indicate "We don't like this feature, and want to get rid of it as causing more problems than it solves, but also acknowledge that it is currently handling legitimate use cases that need to be addressed before we can remove it".

    https://coverage.readthedocs.io/en/coverage-4.4.2/subprocess.html is one example I'm aware of that describes a legitimate use case for being able to run arbitrary code at software startup.

    e9052a66-9d25-42b8-b701-20e674176c81 commented 5 years ago

    coverage.py's documentation mentions:

    The sitecustomize.py technique is cleaner, but may involve modifying an existing sitecustomize.py, since there can be only one. If there is no sitecustomize.py already, you can create it in any directory on the Python path.

    The .pth technique seems like a hack, but works, and is documented behavior. On the plus side, you can create the file with any name you like so you don’t have to coordinate with other .pth files. On the minus side, you have to create the file in a system-defined directory, so you may need privileges to write it.

    This brings to mind the transition of many programs from using a single config file or startup script to using a directory of config/startup files parsed/executed in alphabetical order. Would a sitecustomize.d/ directory (with files within it executed in alphabetical order) as a replacement for executable code in .pth files be an improvement on the status quo?

    cbf13ede-eda8-4246-abee-98732ce73413 commented 5 years ago

    This brings to mind the transition of many programs from using a single config file or startup script to using a directory of config/startup files parsed/executed in alphabetical order. Would a sitecustomize.d/ directory (with files within it executed in alphabetical order) as a replacement for executable code in .pth files be an improvement on the status quo?

    No, because the required execution order is governed by package interdependencies rather than names. SysVInit went around this by hand-picking number prefixes to files in rcN.d/ but this proved unmaintainable in the long run.

    vstinner commented 5 years ago

    I really hate .pth files because the slow down Python startup time for *all* applications whereas .pth files are usually specific to a very few applications using one or two specific modules.

    They can also modify the behavior of Python for all applications, with no way to opt-out.

    I would prefer to have an opt-in option, disabled by default.

    I'm in favor of deprecating the feature in Python 3.8 and remove it from Python 3.9.

    Python 3 already support namespaces which covers the most common use case of .pth files, no?

    Another use case is to run code if a specific command line option is used or if an environment variable is set. For example, my faulthandler backport uses a .pth file to enable faulthandler if PYTHONFAULTHANDLER environment variable is set. I dislike this .pth file (I didn't write it ;-)). I'm fine with dropping this feature as a whole.

    We can add a pending deprecation warning in Python 3.7 right now.

    pitrou commented 5 years ago

    As I said: editable installs (pip install -e) are an important use case of .pth files.

    I don't see how namespace packages have anything to do with this, sorry.

    ncoghlan commented 5 years ago

    Namespace packages in general didn't rely on pth files - only the setuptools/pkg_resources implementation of them did.

    I'll also reiterate that I am *completely* opposed to deprecating the "append entries to sys.path" usage model, as there is absolutely nothing wrong with that (if distros are ending up with an overly cluttered system that's making the standard path too long, then review the individual packages creating the clutter, don't remove the interpreter feature).

    That "append to sys.path" aspect of the feature is all that's needed to make editable installs and virtual environment chaining work.

    That means the aspect I'm in agreement with deprecating is the "arbitrary code execution on startup" case, but even for that, I don't think we should deprecate it until we have a comparable replacement that's more self-evidently a way of allowing arbitrary code execution, and also more obviously has the potential to make every interpreter startup in that Python installation slower.

    I'm not really concerned about execution order issues between interdependent sitecustomize hooks, as there's already no ordering guarantee with .pth files, and if folks do need more control over the interdependencies for some reason then they can just rely on the regular import system rather than something sitecustomize specific.

    So I think Chris Billington's proposed replacement is actually a reasonable idea:

    1. In site.addsitedir, check for a __sitecustomize__ subdirectory after checking for .pth files
    2. If any Python files are found in that directory, execute them
    3. If "python -x importtime" has been specified, report the execution time of each of those files (this would allow both easy identification of any hooks that are being executed, as well as which ones are taking up a lot of time)

    There could then be a "-Z" option that offered a more limited form of "-S": it would allow site.py itself to run, but disable the processing of sitecustomize.py and __sitecustomize__ entries.

    jaraco commented 5 years ago

    I like Nick's proposal. It has I believe the features that satisfy the use-cases of which I'm currently aware... with one edge case you may not have considered - support for multiple __sitecustomize__ locations.

    Consider, for example, the case where __sitecustomize__ is in some system space unwritable by the user, but the package being installed is being installed in --user space.

    Or consider the case where permissions aren't at play, but where you have a package installed in a different part of the PYTHONPATH. For example, pip-run installs a sitecustomize module in a temporary directory that it adds to sys.path. Ignoring for a moment the reason why it does this, I'd like to focus on the general need - that multiple paths on PYTHONPATH might expect __sitecustomize__ support. You wouldn't want to have all of the __sitecustomize__ hooks in one directory, because then they'll be decoupled from components that may or may not be in PYTHONPATH.

    For these reasons, I think you'd want for __sitecustomize__ to be supported to exist in multiple locations on PYTHONPATH and honor all of the files in all such directories, somewhat similar to how namespace packages are supported.

    warsaw commented 5 years ago

    On Jan 14, 2019, at 04:02, STINNER Victor \report@bugs.python.org\ wrote:

    I really hate .pth files because the slow down Python startup time for *all* applications whereas .pth files are usually specific to a very few applications using one or two specific modules.

    They can also modify the behavior of Python for all applications, with no way to opt-out.

    I would prefer to have an opt-in option, disabled by default.

    I completely agree. The other problem is that .pth-caused problems are very difficult to diagnose and debug. Essentially you have to hack site.py to break into the loading machinery. I have to believe that we can come up with a better mechanism that doesn’t suffer from these problems.

    Do we have a single place to capture a list of .pth use cases?

    warsaw commented 5 years ago

    On Jan 14, 2019, at 04:14, Antoine Pitrou report@bugs.python.org wrote:

    As I said: editable installs (pip install -e) are an important use case of .pth files.

    Is that true outside of virtual environments? I care less about .pth files inside venvs, since those are typically isolated to a single development environment, and don’t affect Python applications or libraries globally.

    warsaw commented 5 years ago

    On Jan 14, 2019, at 07:17, Nick Coghlan \report@bugs.python.org\ wrote:

    I'll also reiterate that I am *completely* opposed to deprecating the "append entries to sys.path" usage model, as there is absolutely nothing wrong with that (if distros are ending up with an overly cluttered system that's making the standard path too long, then review the individual packages creating the clutter, don't remove the interpreter feature).

    Yes, there is as Victor and others points out. They do magical things that are difficult to debug and diagnose, and have global effects on the entire Python operating environment.

    I’d be less opposed to a mechanism that is isolated to just those Python applications that need them. I’d like to know about use cases outside of Python applications that can’t be done any other way.

    That "append to sys.path" aspect of the feature is all that's needed to make editable installs and virtual environment chaining work.

    That means the aspect I'm in agreement with deprecating is the "arbitrary code execution on startup" case, but even for that, I don't think we should deprecate it until we have a comparable replacement that's more self-evidently a way of allowing arbitrary code execution, and also more obviously has the potential to make every interpreter startup in that Python installation slower.

    I think we’re all in agreement about deprecating arbitrary code execution, so maybe this issue can concentrate on that, while we figure out what, if anything to do about the path extension use case.

    I don’t care about slow start up of the interactive interpreter, but I do strongly care about the start up times for Python applications in general. That’s why an opt-in mechanism is important.

    1. In site.addsitedir, check for a sitecustomize subdirectory after checking for .pth files
    2. If any Python files are found in that directory, execute them
    3. If "python -x importtime" has been specified, report the execution time of each of those files (this would allow both easy identification of any hooks that are being executed, as well as which ones are taking up a lot of time)

    There could then be a "-Z" option that offered a more limited form of "-S": it would allow site.py itself to run, but disable the processing of sitecustomize.py and __sitecustomize__ entries.

    Is that a global __sitecustomize__ directory you’re talking about, or something specific to a Python application (or library?).

    pitrou commented 5 years ago

    Is that true outside of virtual environments?

    Not in my experience. But I'm not sure special-casing virtual environments will make the situation easier to understand ;-)

    vstinner commented 5 years ago

    I don't think that you will like it, but I feel that a PEP will be needed here to list use cases and explain what replace .pth files for each use case. Maybe no replacement for some use cases is fine. The PEP doesn't have to be long.

    I also expect that it's going to be a large backward incompatible change. A PEP can summerize the rationale, schedule deprecation, etc.

    Any volunteer around? Barry, Nick, someone else?

    warsaw commented 5 years ago

    On Jan 14, 2019, at 17:30, STINNER Victor \report@bugs.python.org\ wrote:

    I don't think that you will like it, but I feel that a PEP will be needed here to list use cases and explain what replace .pth files for each use case. Maybe no replacement for some use cases is fine. The PEP doesn't have to be long.

    I also expect that it's going to be a large backward incompatible change. A PEP can summerize the rationale, schedule deprecation, etc.

    +1

    Any volunteer around? Barry, Nick, someone else?

    I will volunteer to co-author. I would definitely like at least Nick and/or Jason to help.

    ncoghlan commented 5 years ago

    site.addsitedir is called for every site-packages directory (whether global, within a venv, or at the user level), so my proposal above covers appending multiple segments.

    Linux distros approach to handling this is terrible because they dump all their system packages into a single global site-packages, leading to the every growing sys.path problem that Barry is concerned about.

    However, that's entirely the fault of distro packaging policies, and can be remedied in a far superior way by switching distros to a model where they create a venv per application, and then use .pth files to link in the system packages that they actually want visible to that application.

    "Some users don't want to use virtual environments appropriately" is an incredibly poor reason for breaking a perfectly valid feature.

    ncoghlan commented 5 years ago

    Note that any PEP I contributed to writing would need to be restricted to eliminating arbitrary code execution, as I don't think there's anything wrong with the path extension feature.

    jaraco commented 5 years ago

    site.addsitedir is called for every site-packages directory (whether global, within a venv, or at the user level), so my proposal above covers appending multiple segments.

    Good point. I think you're assuming that only site dirs are appropriate for packages that require arbitrary code execution. I think I'd like to break that assumption and allow any location where packages can be installed (PYTHONPATH) to install hooks. Consider this use-case:

    draft $ mkdir pkgs draft $ python3.5 -m pip download -d pkgs future_fstrings Collecting future_fstrings Using cached https://files.pythonhosted.org/packages/36/25/070c2dc1fe1e51901df5875c495d6efbbf945a93a2ca40f47e5225302fb8/future_fstrings-0.4.5-py2.py3-none-any.whl Saved ./pkgs/future_fstrings-0.4.5-py2.py3-none-any.whl Collecting tokenize-rt; python_version \< "3.6" (from future_fstrings) Using cached https://files.pythonhosted.org/packages/76/82/0e6a9dda45dd76be22d74211443e199a330ac7e428b8dbbc5d116651be03/tokenize_rt-2.1.0-py2.py3-none-any.whl Saved ./pkgs/tokenize_rt-2.1.0-py2.py3-none-any.whl Successfully downloaded future-fstrings tokenize-rt draft $ cat > hello-fstrings.py # coding: future_fstrings print(f'hello world')
    draft $ PYTHONPATH=pkgs/future_fstrings-0.4.5-py2.py3-none-any.whl:pkgs/tokenize_rt-2.1.0-py2.py3-none-any.whl python3.5 hello-fstrings.py
    xonsh: subprocess mode: command not found: PYTHONPATH=pkgs/future_fstrings-0.4.5-py2.py3-none-any.whl:pkgs/tokenize_rt-2.1.0-py2.py3-none-any.whl draft $ env PYTHONPATH=pkgs/future_fstrings-0.4.5-py2.py3-none-any.whl:pkgs/tokenize_rt-2.1.0-py2.py3-none-any.whl python3.5 hello-fstrings.py
    File "hello-fstrings.py", line 1 SyntaxError: encoding problem: future_fstrings

    If future-fstrings were properly installed, its runtime hook is called and the script can run:

    draft $ python3.5 -m pip-run -q future-fstrings -- hello-fstrings.py
    hello world

    I'd like for a package like future-fstrings to be able to supply a hook that can be executed on startup that can be honored even if the package isn't installed in one of the site paths.

    Let's make a PEP.

    I'd be delighted to help with the PEP.

    vstinner commented 5 years ago

    SyntaxError: encoding problem: future_fstrings

    IMHO that's the expected behavior. I would prefer to have to explicitly install this special encoding *before* loading a script using it.