python-rope / rope

a python refactoring library
GNU Lesser General Public License v3.0
1.96k stars 164 forks source link

Remove site-packages from packages search tree #723

Closed tkrabel closed 11 months ago

tkrabel commented 1 year ago

Description

Remove site-packages from the python package search tree to avoid duplicate entries in the autoimport database. This also speeds up generating the modules cache by a factor of ~2.

Fixes #722

Checklist (delete if not relevant):

bagel897 commented 1 year ago

I should have written logic to handle this already, it may just be missing the regex rules

bagel897 commented 1 year ago

Here's what I found: https://github.com/python-rope/rope/commit/2182bfe06296a03b953f7cb245452d583cc57875 (utils.py) I may have assumed all virtual environments began with .. This is convention, not guaranteed. Thanks for catching that. You may want to change this logic instead - IE add a check that name is not site-packages in the existing checking logic. This way its more consistent with the other filter mechanisms Did we merge the database versioning update? I would purge and re-create all dbs after this change regardless of the approach

bagel897 commented 1 year ago

Honestly if I was to rewrite this again, I'd abstract the db, parsing and discovery (separate layers for package and module/file discovery) layers out of the main class - it's a bit of a monstrosity. They honestly can be made single threaded. There's also some poor logic in one of the cache generation functions I fixed in a PR I never merged.

tkrabel commented 1 year ago

There's also some poor logic in one of the cache generation functions I fixed in a PR I never merged.

So why not merge it then? :)

bagel897 commented 1 year ago

There's also some poor logic in one of the cache generation functions I fixed in a PR I never merged.

So why not merge it then? :)

I never finished the work, it was part of a much broader change that broke API compatibility. I don't see the problem in the current rope code, but my idea was to use the same function to generate the cache and have both cache generation functions call it instead of implementing the logic twice. Here's the PR: https://github.com/python-rope/rope/pull/485/ But it's not necessary, just an idea.

bagel897 commented 1 year ago

Does this handle cases where venv is in the project root? I am running into that on windows with the autodetect branch. Maybe this is a venv vs virtualenv thing? I think our implementation should be recursive (IE no Lib.site-packages.xyz modules) but I'm not sure how to implement that efficiently

tkrabel-db commented 1 year ago

Does this handle cases where venv is in the project root? I am running into that on windows with the autodetect branch. Maybe this is a venv vs virtualenv thing? I think our implementation should be recursive (IE no Lib.site-packages.xyz modules) but I'm not sure how to implement that efficiently

I think this PR doesn't change the way we treat virtual environments. The PR only doesn't treat site-packages like a python package, which makes sense to me.

That said: @lieryan I saw you approved the PR. Can you merge this?

@bagel897 if you have any concerns related virtual environments, can we please address this in a separate issue/PR?

bagel897 commented 1 year ago

Autoimport scans for packages in the python path image In this example, we can see both ../.venv/lib64/python3.12/site-packages and also the cwd '' If we detect site-packages, that means a directory called site-packages in the search path - which means a failure to detect venvs. We shouldn't be searching the venv directory from the project root. The current method is to exclude directories prefixed with a ., which is insufficient.

Here's another failure example: https://github.com/python-rope/rope/actions/runs/6739504275/job/18321101941?pr=516 image Since it searches the venv, it indexes its contents twice - under the Lib package. We need to exclude all instances of site-packages anywhere in the package or module path. Maybe the solution is to exclude it both at the package and module level?

tkrabel-db commented 1 year ago

@bagel897 @lieryan can we discuss this in a follow up issue and merge this PR? IIUC, my Pr provides an incremental improvement, while not causing any regressions, so I'd not want to delay this improvement unnecessarily.

lieryan commented 1 year ago

Hi @tkrabel-db, thanks for creating this PR. I'm currently travelling on vacation, so my availability to look into rope issues and review PRs will be limited until December.

So don't worry, but I hadn't forgotten about this 🙂, but I won't be able to action the PR or make a new release until December.

tkrabel commented 1 year ago

@bagel897 coming back to your comment:

We need to exclude all instances of site-packages anywhere in the package or module path

I tested my code creating a virtual environment using venv and I am fairly confident that my PR does exactly that. It may help looking at my problem statement again.

There, you can see we have duplications of the form (and only of the form)

Name(name=<import_name>, modname=<mod_name>, package=<package>, ...)
Name(name=<import_name>, modname="site-packages."<mod_name>, package='site-packages', ...)

Let's use an example to facilitate the conversation.

Name(name='BaseName', modname='jedi.api.classes', package='jedi', source=<Source.SITE_PACKAGE: 4>, name_type=<NameType.Class: 7>)
Name(name='BaseName', modname='site-packages.jedi.api.classes', package='site-packages', source=<Source.SITE_PACKAGE: 4>, name_type=<NameType.Class: 7>)

Here, you see that we attribute the BaseName object to jedi in the first entry - the correct match. In the second entry, however, we attribute the same object to a non-existing package we call site-packages.

The change I propose (with your help of pointing me to the correct util; thank you!) captures all cases in which we attribute any object/module to be of root (i.e. the "package") site-packages. This means it also captures the failure case you have eluded to earlier, namely ('from Lib.site-packages.pytoolconfig.sources import pytool', 'pytool').

Again, I tested my code and found that it also captures the case you presented.

If you think my PR still has some gaps, pls show them in a reproducible example based on this PR. And thank you for engaging in this PR! It's truly helpful having the original author around :)

lieryan commented 11 months ago

Thanks for fixing this issue @tkrabel