microsoft / pylance-release

Documentation and issues for Pylance
Creative Commons Attribution 4.0 International
1.67k stars 770 forks source link

Support lazy loading individual virtualenvs in a multi-root workspace #6009

Closed pcasdf closed 3 days ago

pcasdf commented 2 weeks ago

My company uses a monorepo and multi-root VS Code workspace with over 40 Python projects. On initial startup of VS Code, the Python extension takes a ton of CPU and memory because it starts indexing every virtualenv. I really wish there were a config that allows us to choose to only begin indexing a virtualenv when a file within that project / sub-root is opened.

heejaechang commented 2 weeks ago

good idea. something I will do soon.

rchiodo commented 3 days ago

This issue has been fixed in prerelease version 2024.6.102, which we've just released. You can find the changelog here: CHANGELOG.md

pcasdf commented 3 days ago

🙏 Y'all rock! Thank you so much!

rchiodo commented 3 days ago

I should explain how to use it. Well actually @heejaechang would be better at doing that. He implemented it :)

pcasdf commented 3 days ago

Was just about to ask, since it doesn't seem to be the default -- I don't see the setting under python.* either.

heejaechang commented 3 days ago

Hi pcasdf. there is no setting. it is a new default behavior. we were postponing indexing third party libraries until there is a file opened from a workspace, but we didn't do that for user files. now we will postpone any file (third party or user file) until a file is opened for the workspace.

pcasdf commented 3 days ago

@heejaechang is it intended that given this structure:

root
- project a
- - main.py
- project b
- project c

if I open main.py in project a, I should see indexing begin for project b and c as well? That was the behavior that I described originally, which seems to still occur.

pcasdf commented 3 days ago

@heejaechang is it intended that given this structure:

root
- project a
- - main.py
- project b
- project c

if I open main.py in project a, I should see indexing begin for project b and c as well? That was the behavior that I described originally, which seems to still occur.

To clarify, my hope is that I can open a file in project a, and only then will project a begin to index, and project b and c should not try to index until I open a file under their tree. Either way, the change that you made is still a huge improvement for us, because not all of our devs even touch Python files, but we share a single workspace among everyone. So this new default behavior is appreciated 🙏

heejaechang commented 3 days ago

are you using multi root workspace as in vscode's multi root workspace support? or are you using it as different meaning?

if you have multi root workspace where root is

project a
    main.py
project b
project c

opening main.py won't cause project b or c to start indexing.

if you have 1 workspace as root and just have multiple folders under it as project a/b/c and then call it multi-root workspace, current change won't do anything.

pcasdf commented 3 days ago

We do use multi-root project with structure similar to:

root
project a
project b
project c

But I also removed root, leaving it as:

project a
project b
project c

And I see output similar to:

2024-06-25 22:48:39.277 [info] [Info  - 10:48:39 PM] (202134) Starting service instance "a"
2024-06-25 22:48:39.277 [info] [Info  - 10:48:39 PM] (202134) Starting service instance "b"
2024-06-25 22:48:39.286 [info] [Info  - 10:48:39 PM] (202134) Starting service instance "c"
2024-06-25 22:48:40.554 [info] [Info  - 10:48:40 PM] (202134) Background analysis(24) root directory: file:///home/discord/.vscode-server/extensions/ms-python.vscode-pylance-2024.6.102%2Bdiscord.3/dist
2024-06-25 22:48:40.555 [info] [Info  - 10:48:40 PM] (202134) Background analysis(24) started
2024-06-25 22:48:40.567 [info] [Info  - 10:48:40 PM] (202134) Background analysis(1) root directory: file:///home/discord/.vscode-server/extensions/ms-python.vscode-pylance-2024.6.102%2Bdiscord.3/dist
2024-06-25 22:48:40.571 [info] [Info  - 10:48:40 PM] (202134) Background analysis(1) started
2024-06-25 22:48:40.575 [info] [Info  - 10:48:40 PM] (202134) Background analysis(3) root directory: file:///home/discord/.vscode-server/extensions/ms-python.vscode-pylance-2024.6.102%2Bdiscord.3/dist
2024-06-25 22:48:40.575 [info] [Info  - 10:48:40 PM] (202134) Background analysis(23) root directory: file:///home/discord/.vscode-server/extensions/ms-python.vscode-pylance-2024.6.102%2Bdiscord.3/dist
2024-06-25 22:48:40.577 [info] [Info  - 10:48:40 PM] (202134) Background analysis(23) started
2024-06-25 22:48:40.577 [info] [Info  - 10:48:40 PM] (202134) Background analysis(3) started
2024-06-25 22:48:40.579 [info] [Info  - 10:48:40 PM] (202134) Background analysis(16) root directory: file:///home/discord/.vscode-server/extensions/ms-python.vscode-pylance-2024.6.102%2Bdiscord.3/dist
2024-06-25 22:48:40.580 [info] [Info  - 10:48:40 PM] (202134) Background analysis(20) root directory: file:///home/discord/.vscode-server/extensions/ms-python.vscode-pylance-2024.6.102%2Bdiscord.3/dist
2024-06-25 22:48:42.932 [info] [Info  - 10:48:42 PM] (202134) Setting pythonPath for service "a": "/home/discord/.virtualenvs/a/bin/python"
2024-06-25 22:48:42.932 [info] [Info  - 10:48:42 PM] (202134) Setting environmentName for service "a": "3.7.17 (a venv)"
2024-06-25 22:48:42.932 [info] [Info  - 10:48:42 PM] (202134) No include entries specified; assuming /home/discord/discord/a
2024-06-25 22:48:42.932 [info] [Info  - 10:48:42 PM] (202134) Auto-excluding **/node_modules
2024-06-25 22:48:42.932 [info] [Info  - 10:48:42 PM] (202134) Auto-excluding **/__pycache__
2024-06-25 22:48:42.932 [info] [Info  - 10:48:42 PM] (202134) Auto-excluding **/.*
2024-06-25 22:48:42.932 [info] [Info  - 10:48:42 PM] (202134) Assuming Python version 3.7.17.final.0
2024-06-25 22:48:42.932 [info] [Info  - 10:48:42 PM] (202134) Found 441 source files
2024-06-25 22:48:44.091 [info] [Info  - 10:48:44 PM] (202134) Setting pythonPath for service "b": "/home/discord/.virtualenvs/b/bin/python"
2024-06-25 22:48:44.093 [info] [Info  - 10:48:44 PM] (202134) Setting environmentName for service "b": "3.11.9 (b venv)"
2024-06-25 22:48:44.093 [info] [Info  - 10:48:44 PM] (202134) No include entries specified; assuming /home/discord/discord/b
2024-06-25 22:48:44.094 [info] [Info  - 10:48:44 PM] (202134) Auto-excluding **/node_modules
2024-06-25 22:48:44.094 [info] [Info  - 10:48:44 PM] (202134) Auto-excluding **/__pycache__
2024-06-25 22:48:44.094 [info] [Info  - 10:48:44 PM] (202134) Auto-excluding **/.*
2024-06-25 22:48:44.094 [info] [Warn  - 10:48:44 PM] (202134) stubPath file:///home/discord/discord/b/stubs is not a valid directory.
2024-06-25 22:48:44.094 [info] [Info  - 10:48:44 PM] (202134) Assuming Python version 3.11.9.final.0
2024-06-25 22:48:44.094 [info] [Info  - 10:48:44 PM] (202134) Found 6443 source files
2024-06-25 22:48:44.620 [info] [Info  - 10:48:44 PM] (202134) Setting pythonPath for service "c": "/home/discord/.virtualenvs/c/bin/python"
2024-06-25 22:48:44.620 [info] [Info  - 10:48:44 PM] (202134) Setting environmentName for service "c": "3.11.9 (c-env venv)"
2024-06-25 22:48:44.621 [info] [Info  - 10:48:44 PM] (202134) No include entries specified; assuming /home/discord/discord/discord_clyde
2024-06-25 22:48:44.621 [info] [Info  - 10:48:44 PM] (202134) Auto-excluding **/node_modules
2024-06-25 22:48:44.621 [info] [Info  - 10:48:44 PM] (202134) Auto-excluding **/__pycache__
2024-06-25 22:48:44.621 [info] [Info  - 10:48:44 PM] (202134) Auto-excluding **/.*
2024-06-25 22:48:44.621 [info] [Info  - 10:48:44 PM] (202134) Assuming Python version 3.11.9.final.0
2024-06-25 22:48:44.621 [info] [Info  - 10:48:44 PM] (202134) Found 876 source files

Maybe I misunderstand how the indexing works, but this step usually seems to take a lot of CPU.

I do see now a separate step when I open a file under a separate project:

2024-06-25 22:51:15.762 [info] [Info  - 10:51:15 PM] (202134) [IDX(53)] Long operation: index execution environment file:///home/discord/repo/a (3975ms)
2024-06-25 22:51:15.798 [info] [Info  - 10:51:15 PM] (202134) [IDX(53)] Long operation: index packages file:///home/discord/repo/a (4058ms)
2024-06-25 22:51:15.798 [info] [Info  - 10:51:15 PM] (202134) indexed(53) 258 files over 1 exec env
2024-06-25 22:51:15.887 [info] [Info  - 10:51:15 PM] (202134) Indexing finished(53).
2024-06-25 22:51:17.082 [info] [Info  - 10:51:17 PM] (202134) [BG(5)] Long operation: checking: file:///home/discord/repo/a/a/a/user.py (6362ms)
2024-06-25 22:51:17.083 [info] [Info  - 10:51:17 PM] (202134) [BG(5)] Long operation: analyzing: file:///home/discord/repo/a/a/a/user.py (6881ms)

So is this where the actual indexing occurs?

pcasdf commented 3 days ago

I think I understand now that that's where the indexing occurs. Sorry for my confusion!

heejaechang commented 3 days ago

ya, the second one is when indexing started. so if you are saying the first one takes a lot of cpus, it could be something else taking time. also, you can do python.analysis.indexing: false to see whether CPU issue still persist.

heejaechang commented 3 days ago

I think I understand now that that's where the indexing occurs. Sorry for my confusion!

don't worry about it, anyway, if there is other perf (CPU) issue outside of indexing, can you provide us some logs so we can take a look what is going on?

https://github.com/microsoft/pylance-release/wiki/Collecting-data-for-an-investigation.#collecting-cpuprofiles

basically steps you need to do is

  1. start vscode and open multi root workspace as you used to do
  2. wait until vscode goes idle (make sure pylance is loaded. you can create untitled python file to make that happen)
  3. open workspace settings json file
  4. invoke pylance start profiling command
  5. change someting in the setting file such as adding python.analysis.indexing: false and save
  6. wait until vscode goes idle
  7. invoke pyalnce stop profiling command
  8. provide us *.cpuprofile files created by pylance (message box should tell you where those files are)

it would help us a lot to find out where CPUs are used for the part you mentioned.

pcasdf commented 3 days ago

I see now that indexing appears to complete fairly quickly, even in a large 7k Python file sub-project, if VS Code has already loaded all extensions and Pylance has fully loaded and reached the idle state. But when opening a file after the Python extension is just beginning to load, it appears that the background analysis tasks may be blocking before allowing indexing to begin. This is only a problem for us since we have so many projects, or "service instances" as they're named in the output. When we start VS Code and open our first Python file of that session, we usually have to wait about 1-2 minutes before Intellisense begins to work. Afterwards, opening files in other projects is fairly fast, even for those that hadn't been indexed yet. This isn't a terrible experience, but I would love to see a speed up in that initial time from start up to working Intellisense.

Here are two profiles. The first one was after disabling indexing, and the second is after enabling indexing. pyright-cpuprofile.tar.gz pyright-cpuprofile-2.tar.gz

Edit: After actually timing it, it appears to only take ~30 seconds from initial load to completing indexing of the opened file. Sorry, I think I might have been wasting your time 😓 thank you for all the help!

heejaechang commented 2 days ago

no worry. thank you for providing the data!

heejaechang commented 2 days ago

found the issue. dupe of https://github.com/microsoft/pylance-release/issues/6046