microsoft / python-language-server

Microsoft Language Server for Python
Apache License 2.0
910 stars 130 forks source link

symlink loops recursively followed #2006

Open joelhock opened 4 years ago

joelhock commented 4 years ago

Expected behaviour

recursive symbolic loops to not be followed until stat() fails with ELOOP (Too many levels of symbolic links)

Actual behaviour

given this workspace:

$ ls -Rl
.:
total 8.0K
drwxr-xr-x 2 joel atl 4.0K 20200424 22:39:58 bar/
drwxr-xr-x 2 joel atl 4.0K 20200424 22:38:37 foo/
-rw-r--r-- 1 joel atl    0 20200424 20:21:41 test.py

./bar:
total 0
lrwxrwxrwx 1 joel atl 6 20200424 22:37:47 foo -> ../foo/
-rw-r--r-- 1 joel atl 0 20200424 22:39:58 other.notpy
-rw-r--r-- 1 joel atl 0 20200424 22:39:54 other.py

./foo:
total 0
lrwxrwxrwx 1 joel atl 6 20200424 22:37:52 bar -> ../bar/
-rw-r--r-- 1 joel atl 0 20200424 22:38:37 it.notpy
-rw-r--r-- 1 joel atl 0 20200424 22:38:32 it.py

strace of the LSP has lots of recursive transversals, ending with:

14028 lstat("/usr/scratch/joel/testsymlink/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/
foo/bar/foo/bar/foo/bar/foo/bar/other.py", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
14028 geteuid()                         = 1012
14028 stat("/usr/scratch/joel/testsymlink/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo", 0x7fd98fffcb00) = -1 ELOOP (Too many levels of symbolic links)

Logs

in this trivial example, it completes fine since ELOOP failure is quickly reached. in our more complex setup with recursive symbolic links on NFS, it causes startup to take forever.

Notes

This is likely similar to the closed https://github.com/microsoft/python-language-server/issues/181 , which people have recently started commenting on again.

Also, (this observation might warrant a separate issue) this recursive loop in our actual workspace is avoided in vscode itself via enumerating the recursive symlinks in files.watcherExclude, which stops decent once a /** is matched, but it seems the python LSP does a full recursive decent of the entire source tree, even though files found during the descent might eventually be discarded via the exclude.

byyo commented 4 years ago

+1 to this. Our repo has recursive symlinks created by golang dependencies, which makes Microsoft Python Language Server spin in a tight loop. We're unable to switch from Jedi to the MS Language Server for now.

jakebailey commented 4 years ago

It's strange that .NET's builtin file enumeration does not handle this (though perhaps not strange given that the idea of symlinks in Windows is sorta "new", and stuff like this leaks out to other platforms).

Certainly something to look into, but it's hard for me to think of any situation where recursive symlinks would be useful and not trip up many different programs... If you can add a recursive symlink, then is it expected for a program to be able to do import foo.bar.foo.bar.foo.bar.foo.bar...... and it be valid? Certainly scares me.

avoided in vscode itself via enumerating the recursive symlinks in files.watcherExclude

This setting is strictly for the VS Code UI, and is not used by the language server.

Our repo has recursive symlinks created by golang dependencies

This is entirely unrelated to this issue, but it surprises me greatly that a Go project has directory symlinks at all (let alone recursive ones), given the Go toolchain explicitly does not support them and bails out early if it finds any while building...

joelhock commented 4 years ago

@jakebailey Sorry I wasn't clear about how I think the vscode excludes could come into play here. What I was referring to was that vscode's python extension uses this logic:

    protected getExcludedFiles(): string[] {
        const list: string[] = ['**/Lib/**', '**/site-packages/**'];
        this.getVsCodeExcludeSection('search.exclude', list);
        this.getVsCodeExcludeSection('files.exclude', list);
        this.getVsCodeExcludeSection('files.watcherExclude', list);
        this.getPythonExcludeSection(list);
        return list;
    }

to determine what to pass in the LSP's initializationOptions.excludeFiles (if I'm reading the source code correctly). If the language server stopped its tree traversal when a /** was matched, then that might pose one solution to this issue (if one of the symlinks in the loop were excluded).

For a little background, the issue is manifesting itself in our system because we can have multiple in-tree builds that have many c++ and python outputs, including some convenience symlinks that end up forming a loop with each other, so the large output trees are traversed many times.

jakebailey commented 4 years ago

That's an initialization option for us, yes, but its only use is for file indexing for symbols, i.e. the document symbol call (the outline) and the workspace symbols, so that files that VS Code ignores are also not searched via those symbol calls.

But this entry is not used for the actual analysis. If we were to actually ignore any site-package file, for example, we would never be able to analyze any library files. If we ignored Lib, the standard library would not work on Windows.

And even then, we ignore the ignores in some cases, i.e. when we really do want to index site-packages to build info for auto imports. The last time I touched that code (last week) I had actually planned to remove it, but didn't to make the PR smaller...

jakebailey commented 4 years ago

Any fix to not recursing via symlinks will be an entirely different fix, unrelated to that listing (which is arbitrary, and is more historical to the pre-rewrite language server a few years ago more than anything; the old-old language server would use globbing to enqueue every file and run until the queue empties, which is not the case now).

jjyyxx commented 4 years ago

I met this problem recently. I'm using Bazel to build a c++ & python project, which contains circular symlinks. I had to manually remove some symlinks to break the loop. Luckily, that didn't break my code.

I would like to know is there a more elegant workaround, such as listing them in files.exclude? Or is there a schedule for fixing this bug?

p-se commented 4 years ago

It's strange that .NET's builtin file enumeration does not handle this (though perhaps not strange given that the idea of symlinks in Windows is sorta "new", and stuff like this leaks out to other platforms).

Certainly something to look into, but it's hard for me to think of any situation where recursive symlinks would be useful and not trip up many different programs... If you can add a recursive symlink, then is it expected for a program to be able to do import foo.bar.foo.bar.foo.bar.foo.bar...... and it be valid? Certainly scares me.

Symlinks don't trip up PyCharm. Jedi, a framework for Python auto-completion is also not tripped by it and can be used in VS Code instead, without any issues related to symlinks.

avoided in vscode itself via enumerating the recursive symlinks in files.watcherExclude

This setting is strictly for the VS Code UI, and is not used by the language server.

Unfortunately that is true. It means that I don't have any way in VS Code to work around that problem, except for actually removing the symlinks.

Our repo has recursive symlinks created by golang dependencies

This is entirely unrelated to this issue, but it surprises me greatly that a Go project has directory symlinks at all (let alone recursive ones), given the Go toolchain explicitly does not support them and bails out early if it finds any while building...

I understand that it might not be trivial to fix that for all projects, but it would help me a great deal if I were able to manually specify paths which the language server won't follow for its analysis. As a result, I wouldn't have to remove the symlinks and ignore those changes in git, until I rebase or checkout the next branch. Fortunately, it doesn't break the project for me (yet).