python-lsp / python-lsp-server

Fork of the python-language-server project, maintained by the Spyder IDE team and the community
MIT License
1.75k stars 186 forks source link

AutoImport can break when being called by multiple threads #474

Closed tkrabel closed 6 months ago

tkrabel commented 8 months ago

What is the problem?

In https://github.com/python-rope/rope/issues/713, I explain that AutoImport APIs can only be called from the thread that created it. This causes issues when pylsp is using multithreading, as it does when launched with the --ws argument.

As of now, we haven't seen this issue being reported, but I can replicate it in a testing environment, when calling workspace.close() (as happens on e.g. m_shutdown). My current solution of using one AutoImport object per rope_autoimport feature (https://github.com/python-lsp/python-lsp-server/pull/471) mitigates this issue, but won't solve it.

What are potential solution options?

I considered three options

  1. Exposing check_same_thread by AuoImport and setting AutoImport(chech_same_thread=False): sqlite3 allows creating connections, setting check_same_thread=False (default True), which turns off thread checks. This would solve not running into sqlite thread check errors, and sqlite seems to be threadsafe but we're highly discouraged from using multithreads with sqlite. Also, check_same_thread is currently not exposed by AutoImport, and there is pushback against doing that (https://github.com/python-rope/rope/pull/714#issuecomment-1780483050). This may make patching AutoImport necessary, which is what I'd abstain from.
  2. Use one AutoImport object per thread: This seems a valid approach in the beginning, but leaves questions like "how we close all data base connections on m_shutdown?" unanswered. I don't know how to solve that from within a thread pool without passing the executor to every request, and that would couple the language server with a concurrency detail. But maybe this is not an issue? I don't know.
  3. Use always the same worker thread for API calls on AutoImport: To make operations threadsafe while supporting using a thread pool, we can proxy all autoimport API calls through a separate thread pool with one worker thread. This makes autoimport calls a bottleneck, but since calls to it are very fast, this shouldn't be too much of a problem.
  4. Upstream change: create one connection per thread using a thread local: https://github.com/python-rope/rope/pull/720 We can let rope swallow the multithreading complexity while avoiding artificial bottlenecks like in (3). This would require bumping our requirements though, which is something I am OK with.

What is my recommendation?

If there are no objections, I would go with option (4). Option (1) may non-deterministically corrupt the autoimport database, which is extremely hard to debug, and (2) may leave dangling database connectors. (3)'s performance degradation is negligible I think, but I prefer solving this upstream.

ccordoba12 commented 8 months ago

If there are no objections, I would go with option (4).

Sounds good to me.