spyder-ide / spyder

Official repository for Spyder - The Scientific Python Development Environment
https://www.spyder-ide.org
MIT License
8.14k stars 1.58k forks source link

Suggestion: insert paths from PYTHONPATH manager before system's PYTHONPATH #17066

Open oscargus opened 2 years ago

oscargus commented 2 years ago

Problem Description

It would be convenient if the paths added to the PYTHONPATH were added before (or could be made to be added before) the system PYTHONPATH.

Use case: I want to run a package from source (for development reasons) that I already have installed through Conda or pip. Right now, on Windows, I have to add it to the system PYTHONPATH, which, sort of, removes the benefit of the Spyder PYTHONPATH manager.

I noted #16429 and maybe that solves the problem (by importing and then moving the source to top), but maybe there can be an alternative solution as well, like a check box selecting if the context of the PYTHONPATH manager should be prepended or appended? I ping @mrclary directly here, to confirm that it can be done when/if #16429 is merged and here opinions on a possible implementation as part of that.

dalthviz commented 2 years ago

Hi @oscargus thank you for the feedback! What do you think @spyder-ide/core-developers ?

mrclary commented 2 years ago

@oscargus, I agree that whatever the user controls in Spyder's PYTHONPATH Manager should take precedence, i.e. should be placed in front of any other paths (site-packages, etc.). This gives the user the most control.

I will have to double-check whether #16429 enforces this or not. I'm pretty sure that this is the case for command completions in the Editor. I'm uncertain about the handling of sys.path in the IPython console, though. I'll report back.

mrclary commented 2 years ago

The Python Language Server does not place PYTHONPATH Manager paths at the front of sys.path for Jedi completions in the Editor and #16429 cannot change this. The offending line is pylsp.workspace.Document:266 in jedi_script. This

sys_path = self.sys_path(environment_path, env_vars=env_vars) + extra_paths

should be changed to this

sys_path = extra_paths + self.sys_path(environment_path, env_vars=env_vars)

I can submit a merge request to palantir/python-language-server python-lsp/python-lsp-server but there is no guarantee that it will be accepted.

dalthviz commented 2 years ago

Just in case @mrclary I think the change then should be done over python-lsp/python-lsp-server (the org and repo we migrated from the palantir one and that we also support), right @ccordoba12 @andfoy ?

mrclary commented 2 years ago

The IPython console does not place PYTHONPATH Manager paths at the front of sys.path and #16429 cannot change this. The offending line is spyder_kernels.customize.spydercustomize:730 in set_spyder_pythonpath. This

sys.path.extend(pathlist)

should be changed to this

[sys.path.insert(0, p) for p in pathlist[::-1]]

I can submit a merge request to spyder/spyder-kernels, subject to approval by the Spyder team.

mrclary commented 2 years ago

Just in case @mrclary I think the change then should be done over python-lsp/python-lsp-server (the org and repo we migrated from the palantir one and that we also support), right @ccordoba12 @andfoy ?

@dalthviz, you are correct.

mrclary commented 2 years ago

Changing spydercustomize.py does not appear to behave as expected. Some further investigation into spyder-kernels will be required. Nevertheless, if the requested feature is accepted, I'll invest the time to get it working properly.

mrclary commented 2 years ago

Changing spydercustomize.py does not appear to behave as expected. Some further investigation into spyder-kernels will be required. Nevertheless, if the requested feature is accepted, I'll invest the time to get it working properly.

Nevermind, it works as expected.

ccordoba12 commented 2 years ago

@mrclary, what if the user has a module like string.py in one of the directories added to Spyder's PYTHONPATH manager? With your PRs, that would make their code and Jedi fail in bizarre ways because those directories would take precedence over standard library modules.

So, instead of doing that unconditionally, I think we should add an option in our PYTHONPATH manager (off by default) to ask users if they want the added paths to be in front of everything else (as @oscargus suggested). That way, users that really know what they are doing won't be surprised by this, and the rest would be safe.

mrclary commented 2 years ago

@ccordoba12, I don't think this should be a problem after #16429 is merged. Currently there are some idiosyncrasies in the handling of these paths for command completion, which may be why Jedi may fail strangely. With #16429 Jedi's execution environment is isolated from the paths provided as search paths. Thus, per your example, Jedi's runtime environment should never see the user's module string.py, which would have caused it to break.

Now, if we want to add a feature to PYTHONPATH Manager that allows the user to choose where to place the paths within sys.path for IPython consoles and command completion, I think it can be incorporated into #16429, with some additional changes to spyder-kernels and python-lsp-server.

mrclary commented 2 years ago

@ccordoba12, even without #16429 I could not reproduce any Jedi failures with string.py in the completion search path. Is there perhaps some other conditions required? pypath-priority

mrclary commented 2 years ago

Disabling the path in PYTHONPATH Manager restores the completion target to the standard library module.

oscargus commented 2 years ago

Thanks @mrclary for the exhaustive investigation!

I can clearly see @ccordoba12's worry about users having files with standard names in the path. It may lead to an increased number of "bug" reports and although it works if you know what you are doing there is a certain degree of risk to it. (I've experienced bug report for SymPy where users just have some other random file in their path which breaks SymPy and although it states clearly in the error message which file was accessed, report are still posted.)

I can also imagine having the system path as a separate item in the manager and then be able to locate it exactly, so some can go before and some after. However, it may be a safer bet to have a check-box, possibly with a warning pop-up, if the paths in the manager should be added before the system path.

mrclary commented 2 years ago

Thanks @mrclary for the exhaustive investigation!

Thank you for the kind words.

I can clearly see @ccordoba12's worry about users having files with standard names in the path. It may lead to an increased number of "bug" reports and although it works if you know what you are doing there is a certain degree of risk to it. (I've experienced bug report for SymPy where users just have some other random file in their path which breaks SymPy and although it states clearly in the error message which file was accessed, report are still posted.)

We certainly do not want to cause problems for our users and unnecessarily drive more bug reports. I was responding to @ccordoba12's comments targeting Jedi, specifically. However, you are correct that we should be concerned about the order of paths in sys.path in the IPython console environment. While for the language server and Jedi the execution environment can be separate from the completion search paths, any manipulation of sys.path in the IPython console environment will have direct implications for code that users execute therein, such as your example of SymPy.

I can also imagine having the system path as a separate item in the manager and then be able to locate it exactly, so some can go before and some after.

If you mean the base sys.path to remain as a contiguous block, and PYTHONPATH Manager paths can individually be placed before or after the base sys.path, then I think this may be possible. If you mean manipulating the paths in the base sys.path, then I am very incredulous that this can be done with any reliability, since these are determined by the site package of individual environments and beyond the scope of Spyder to control.

However, it may be a safer bet to have a check-box, possibly with a warning pop-up, if the paths in the manager should be added before the system path.

I would recommend label text that provides the warning over a pop-up window, similar to other instructional information that we have elsewhere, e.g. in the Preferences widget.

So, here are the two possible approaches that I would recommend.

  1. Have a single checkbox in the PYTHONPATH Manager that, when selected, inserts all the paths in order in the front of the base sys.path. The default would be deselected, in which case all the paths are appended to the back of the base sys.path (current behavior). The checkbox label could include a warning about placing paths in front of the base sys.path.
  2. Have a "dummy" PYTHONPATH Manager entry that is always greyed out but visible that represents the base sys.path, with a default position at the top of the list (replicating current behavior). The user can then arrange any path before or after the base sys.path. The PYTHONPATH Manager window could include text in the main frame warning the user about placing paths above the base sys.path.

Either of these options could be included in the #17077, python-lsp/python-lsp-server#139, and spyder-ide/spyder-kernels#352 PR chain. These would complement, but not depend on, #16429.

What do @spyder-ide/core-developers think?

mrclary commented 2 years ago

I was responding to @ccordoba12's comments targeting Jedi, specifically.

Well, actually, I forgot about Spyder's option to use Jedi for command completion in IPython consoles. So, in this case, Jedi, like any other module, could fail when executed in the IPython console environment if users have a module name conflict with a standard module and the user's module is found before the standard module.

oscargus commented 2 years ago

If you mean the base sys.path to remain as a contiguous block

Yes, that is what I meant. Although the alternative may be "nice", I cannot really see any practical use case for it, and I agree that it may indeed be quite challenging...

mrclary commented 2 years ago

Okay, upon further consideration, I think that the paths listed in PYTHONPATH Manager should be placed at the front of sys.path because that is what Python does with the contents of the PYTHONPATH environment variable.

If a user has the PYTHONPATH environment variable defined, when starting Python from the command line sys.path becomes [''] + <PYTHONPATH paths> + <standard library paths>. However, Spyder places the user defined paths at the end of sys.path (after the standard library paths), which is contrary to standard Python behavior; in particular, [''] + <standard library paths> + <extra_paths>.

Additionally, I think that the "current working directory" entry '' should always be first to be consistent with standard Python behavior. #17077, python-lsp/python-lsp-server#139, and spyder-ide/spyder-kernels#352 currently result in <extra_paths> + [''] + <standard library paths> and should be updated to [''] + <extra_paths> + <standard library paths>.

@ccordoba12 expressed concern about this in a previous comment; however, I was unable to reproduce the issue. I believe we have a regression test for this as well (don't we?), so I don't think we should see any surge in bug reports.

Currently, Spyder has idiosyncratic behavior with respect to the users PYTHONPATH environment variable and Spyder's PYTHONPATH Manager. #16429 resolves these idiosyncrasies and homogenizes the behavior across platforms and launch mechanisms so that <extra_paths> will always be the contents of PYTHONPATH Manager, regardless of platform and launch mechanism.

@spyder-ide/core-developers, I invite your response.

impact27 commented 2 years ago

My opinion is that most people will add something to the python path so python can find it and they can use a script from wherever. Replacing installed package is probably a more niche use case. It might be better to add after by default and add an option to add before?

mrclary commented 2 years ago

My opinion is that most people will add something to the python path so python can find it and they can use a script from wherever. Replacing installed package is probably a more niche use case. It might be better to add after by default and add an option to add before?

@impact27, regardless of the motivations of the OP, I think that user-supplied paths should go in front since that is the standard Python behavior, i.e. running a script with Python from the command line should have the same path structure as running the script in Spyder's IPython console. Can you comment on this point?

impact27 commented 2 years ago

That makes sense, but in the other hand, it is more easy to set up a path and forget in Spyder than when running from the command line.

ccordoba12 commented 2 years ago

Replacing installed package is probably a more niche use case. It might be better to add after by default and add an option to add before?

I agree with @impact27. Spyder makes quite easy to add new paths to PYTHONPATH, so the chance for users to shoot themselves in their foot is much higher. What I mean by that is that Spyder would allow them to create code that (most likely) would fail to work outside of it (see below).

@ccordoba12 expressed concern about this in a previous comment; however, I was unable to reproduce the issue. I believe we have a regression test for this as well (don't we?), so I don't think we should see any surge in bug reports.

Even if the PyLSP and spyder-kernels work by putting our additions to PYTHONPATH in front of sys.path, that would allow users to (very easily) shadow standard library modules (like in the example shown by @mrclary in https://github.com/spyder-ide/spyder/issues/17066#issuecomment-999906142).

Since most of those module names are rather simple (random.py, logging.py, etc), they are usually picked up by users for their own projects. So this change would introduce a terrible confusion: why my code works perfectly fine inside Spyder but fails horribly outside of it?

I think that user-supplied paths should go in front since that is the standard Python behavior

I'm -1 on this for the reasons I mentioned above. However, I also understand that it's confusing that Spyder says it has a PYTHONPATH manager and nevertheless doesn't handle it in the same way as Python does. That's why I propose to rename that functionality to something like Add other paths to import code (or something like that) because it's what we are really doing at the moment.

ccordoba12 commented 2 years ago

Additionally, I think that the "current working directory" entry '' should always be first to be consistent with standard Python behavior

I disagree with this too because (again) it would allow people to shadow standard library modules by creating their own ones in their cwd, which usually breaks other libraries. This was a very common problem and a source of a lot of confusion before we "fixed" it a couple of years ago.

I guess people are also confused when Python fails to import their modules and instead imports the corresponding ones in the standard library. But (i) their code doesn't break with incomprehensible error messages, so they don't need to open bugs or StackOverflow questions about it; and (ii) I guess they eventually learn there are other modules that take precedence over theirs, so they are forced to rename them (which at the end is a good thing).

I understand that for power users it's frustrating that Spyder has these small differences with regular Python. But, at least for me, the Python defaults are a terrible UX for newbies. And now that Python became the de facto language for science and engineering, we have to design things for them, not for power users.

mrclary commented 2 years ago

Since most of those module names are rather simple (random.py, logging.py, etc), they are usually picked up by users for their own projects. So this change would introduce a terrible confusion: why my code works perfectly fine inside Spyder but fails horribly outside of it?

Perhaps you mis-typed, but the confusion "why my code works perfectly fine inside Spyder but fails horribly outside of it?" is the current state of affairs and would not be introduced by the changes that I propose. The changes I propose may elicit "why my code used to work perfectly fine but now fails horribly inside Spyder, and it also fails horribly outside of it?".

Nevertheless, it seems that the prevailing opinion is to keep the user-defined paths after the standard library paths. Are we then in agreement to provide a mechanism to allow users to place it before the standard library paths, according to my previous comment option 1?

CAM-Gerlach commented 2 years ago

I don't see the problem with at least an opt-in option to enable the standard PYTHONPATH behavior, perhaps with a suitable warning. While the standard Python behavior may be confusing and frustrating to beginner users in certain situations, the non-standard Spyder behavior with no way to override it is likely to be the same for power users in others, at least without any way to override it. If users actively click the UI element (with a warning, if necessary) to enable this, after opening the PYTHONPATH manager to begin with (which already I don't think a beginner would likely do; I don't think I ever have) then we can reasonably assume they know what they are doing.

impact27 commented 2 years ago

Well I don't think a warning is necessary if the option is opt-in. I would be open to have the default python behaviour as the only option, but I am unclear on when the issue would manifest. I think I would lean on the "having an option" side. The question is, what is more likely: a user confused because he created a "threading.py" file or a user confused because the file he created doesn't replace the library?

CAM-Gerlach commented 2 years ago

I don't think it really is either for the reasons you mentioned; I included that as an option if needed to assuage @ccordoba12 's concerns about it.

At least in my experience running into this issue many times in the past on Spyder's issue tracker (at least until the changes to mostly fix this for Spyder itself), as well as a number of places elsewhere, I'd say the former is substantially more likely; it is that common (at least in my experience) for users to really need to do this, and those that do (and have good reason for it) as a rule tend to have substantially more experience and know how to handle the situation.

mrclary commented 2 years ago

I don't like too many pop-up windows, but informing users of potential hazards is important. Therefore my vote is for explanatory text next to the option checkbox.

ccordoba12 commented 2 years ago

but the confusion "why my code works perfectly fine inside Spyder but fails horribly outside of it?" is the current state of affairs and would not be introduced by the changes that I propose, The changes I propose may elicit "why my code used to work perfectly fine but now fails horribly inside Spyder, and it also fails horribly outside of it?".

If you're talking about setting PYTHONPATH outside Spyder and then running code inside it, then yes, currently things fail horribly. That's why (on a second thought) I think the best would be to add an externally set PYTHONPATH in front of sys.path in Spyder. I'm sure that's going to cause confusion for people that did that and forgot about it, but it'd work the same in and out of Spyder. So they'd have to figure out the failure by themselves.

What I really meant by this is allowing people by default to shadow standard library modules with our Python path manager, which simply wouldn't work outside Spyder. That's why the paths added through it should remain at the end of sys.path and we should change its name to better reflect that.

Are we then in agreement to provide a mechanism to allow users to place it before the standard library paths

Yes, I agree with that.

Therefore my vote is for explanatory text next to the option checkbox.

I agree with @mrclary's suggestion. I think clearly explaining what the checkbox does in the dialog itself would be more informative.

CAM-Gerlach commented 2 years ago

@ccordoba12 I agree with the other two, but I'm a little confused and unsure what you're proposing with the first part. I'm not sure if you're saying that Spyder should handle externally-set PYTHONPATH differently (does its changes to ensure Spyder and kernel startup don't fail due to user shadowing installed libraries also affect user code?), or suggesting a name change to the Python path manager, or something else? Could you maybe explain that a little differently?

mrclary commented 2 years ago

My vote is to keep the PYTHONPATH in the name. I agree that "Manager" can be misleading as it does not alter or modify the system PYTHONPATH environment variable, but it does intercept it and fills the same role as PYTHONPATH for the purposes of downstream processes (i.e. command completion and IPython console).

mrclary commented 2 years ago

If you're talking about setting PYTHONPATH outside Spyder and the running code inside it, then yes, currently things fail horribly. That's why (on a second thought) I think the best would be to add an externally set PYTHONPATH in front of sys.path in Spyder. I'm sure that's going to cause confusion for people that did that and forgot about it, but it'd work the same in and out of Spyder. So they'd have to figure out the failure by themselves.

What I really meant by this is allowing people by default to shadow standard library modules with our Python path manager, which simply wouldn't work outside Spyder. That's why the paths added through it should remain at the end of sys.path and we should change its name to better reflect that.

I'm confused because the first paragraph seems like you've changed your mind and you want PYTHONPATH Manager at the front of sys.path, but the second paragraph seems like you want to keep it at the end of sys.path.

Just to clarify the potential issues presented by the different scenarios:

PPM = PYTHONPATH Manager PP = system environment variable PYTHONPATH (usually set in e.g. ~/.bash_profile or similar) Assume that a user has defined a package with the same name as a standard library package or function, e.g. random.py, the path to which the user may put in PPM and/or PP in the below scenarios.

  1. path in PPM, but not in PP
    • code run in Spyder sees standard library random.py
      • "Why isn't my random.py package being picked up?" because PPM is placed at the end of sys.path.
      • standard libraries that depend on the standard random.py WILL NOT break.
    • code run in Terminal sees standard library random.py
      • "Why isn't my random.py package being picked up?" because PP is not set.
      • standard libraries that depend on the standard random.py WILL NOT break.
  2. path in PPM and in PP
    • code run in Spyder (same as 1)
    • code run in Terminal sees user random.py
      • "Why does everything break?" because...
      • standard libraries that depend on the standard random.py WILL break because PP is at front of sys.path.
ccordoba12 commented 2 years ago
  1. path in PPM, but not in PP

That's the scenario I have in mind: things will not break in Spyder because PPM puts paths at the end of sys.path. And they also won't break outside of it because Spyder is not really in charge of handling PP for other Python applications/libraries.

And precisely due to the last reason, I think we should rename PPM to something else. The problem is the name PPM is confusing for power users because they consider it a proper PYTHONPATH manager, so they expect paths to be added at the beginning of sys.path, and to do that in and out of Spyder.

  1. path in PPM and in PP
    • code run in Spyder (same as 1)

Even if the extra paths are the same in both PPM and PP, we need to decide if they should be added at the beginning or end of sys.path. I'd say at the beginning for consistency with the way PP works and a third scenario you missed in your comment (see below).

Of course, that would imply that the situation would be reversed and code would break in Spyder (as well as in the Terminal).


There's a third scenario (which is the one I was referring above in my first paragraph):

  1. Path in PP and not in PPM.
    • Code in Spyder and Terminal sees random.py and breaks.
mrclary commented 2 years ago

There's a third scenario (which is the one I was referring above in my first paragraph):

  1. Path in PP and not in PPM.

    • Code in Spyder and Terminal sees random.py and breaks.

Currently, Spyder does not pass PP to IPython Console, so in this scenario, code run from Spyder will see the standard library random.py and not break. However, I think what you're referring to is when Spyder is launched from command line (e.g. conda installation) in which case Spyder's internal code may break if a user has shadowed a standard library in PP that Spyder depends on. If this is the case, then we may want to consider protecting Spyder from this by scrubbing Spyder's internal sys.path upon launch.

In any case, perhaps we can discuss this more easily in a group zoom meeting?

ccordoba12 commented 2 years ago

Currently, Spyder does not pass PP to IPython Console, so in this scenario, code run from Spyder will see the standard library random.py and not break.

I thought we were already doing it! Sorry, my bad for the extra confusion.

However, I think what you're referring to is when Spyder is launched from command line (e.g. conda installation) in which case Spyder's internal code may break if a user has shadowed a standard library in PP that Spyder depends on. If this is the case, then we may want to consider protecting Spyder from this by scrubbing Spyder's internal sys.path upon launch.

Right, but I don't know how easy that could be and if it's worth the effort (no one has reported a problem due to that as far as I can recall).

In any case, perhaps we can discuss this more easily in a group zoom meeting?

Good point! Actually, that's one of our new year surprises: our plan is to start doing regular developer meetings since February (we're sorting some things out at the moment about that).

mrclary commented 2 years ago

Currently, Spyder does not pass PP to IPython Console, so in this scenario, code run from Spyder will see the standard library random.py and not break.

I thought we were already doing it! Sorry, my bad for the extra confusion.

Nope, sorry, you are correct, I confused myself. In the command line launch scenario, PP is passed to IPython console. It is not passed in the macOS application launch scenario. So, yes, your scenario 3 above is a good point.

I'll put in a shameless plug here for #16429. With that PR, PP will never be passed directly to IPython console under any launch scenario; only what is in PPM will be passed to IPython console, but the user has the option to import PP into PPM, so he will always know and control what gets passed to IPython console (nothing hidden).

In any case, perhaps we can discuss this more easily in a group zoom meeting?

Good point! Actually, that's one of our new year surprises: our plan is to start doing regular developer meetings since February (we're sorting some things out at the moment about that).

Excellent!

mrclary commented 2 years ago

@spyder-ide/core-developers, I think I understand better now what is going on. First, the current situation:

So what does all this mean?

In short, I think we are safe to move these paths to the front of sys.path in the IPython Console. This will be in better compliance with standard (expected) Python and IPython behavior and SPY_PYTHONPATH already prevents the console from crashing in the scenarios that @ccordoba12 and others are concerned about.

I think #17512 and spyder-ide/spyder-kernels#378 should resolve this issue as well as resolve the issue where the paths in sys.path would be inconsistent if the user disables and re-enables paths.

ccordoba12 commented 9 months ago

@mrclary, could you take care of this one for Spyder 6? I have an idea on how to provide a really simple way to add paths from our PPM to the front of sys.path.