Open Pierre-Sassoulas opened 1 year ago
A few notes:
ruff
uses RustPython, and RustPython's AST doesn't yet parse match statements. However you can instead use libcst
's parser, which is heavier but supports match statements. (Although, maintenance of libcst has slowed down recently and I've found a few bugs in it myself.)The major problem I see is that if you move away from Python's ast
module, you gain a massive burden of maintaining whatever AST library that you use, be it RustPython or something else, everytime a user finds a bug or Python has a syntax changing update.
I think the main issue with contributing to astroid
is the state of the codebase. It's just very unclear how everything works.
Any attempt to fix that is met with "breaking changes" concerns. I would love to spend some time just breaking full API compatibility within astroid
and make sure all typing of the nodes
is correct. That would already help so much, but to do so we would need to make a conscious decision to break backwards compatibility with a 3.0
cut off.
I think it's a given that breaking the API will need to be done:
But what I want to discuss here is:
I don't think rust
will help much personally.
At the moment I think it will only limit the number of potential contributors (venn diagram of people interested in helping pylint
and those that know rust
is small I think).
I do think it would be interesting to do some of the endless recursion in rust
at some point when we have better interfaces for it in astroid
that can easily switch between rust
and python
.
One way that I see rust working out is by moving the intensive logic into rust. That worked for libcst very well in my experience.
Personally, I think moving the inference engine to rust is something to think about. i.e. things like astroid_cache
, InferenceContext
etc. are stored in rust, and every .infer()
call will be running code in rust-land, just returning the result.
By doing so, not only are you moving the complex, highly recursive logic to a type safe, memory safe language that's easier to maintain, you're also probably speeding up inference many times over and solving recursion limit bugs like https://github.com/PyCQA/pylint/issues/7898
Proper separation of code and being able to rewrite inference to a cleaner codebase is added bonus.
Totally agree, but until we have a modern codebase I don't think we have a very good idea of what that area would be.
Clearly it is "inference" but in general an inference call isn't that problematic. Many of the performance issues within pylint
come from start up and the constant inference of the full stdlib.
I really don't want to squash anybody's hopes of doing this because I am in favour, but I think we should first have a much better design in order to, for example, not end up with the globally shared caching issues like we do now.
Something else to consider is compiling with Mypyc along the lines of https://ichard26.github.io/blog/2022/05/compiling-black-with-mypyc-part-1/. Obviously doing that will require getting the codebase correctly typed. Correct typing would also likely be a preliminary step for fitting in a Rust module.
However, the existing codebase is manifestly type-incorrect, and that type-incorrect behavior is guaranteed as part of the API in perpetuity. Fixing the type errors is impossible, since the type errors are considered correct by definition.
So I agree with @DanielNoord that development should be moved towards version 3, and sooner rather than later. Is there any reason not to do it?
Obviously doing that will require getting the codebase correctly typed. Correct typing would also likely be a preliminary step for fitting in a Rust module.
Definitely. And seeing how hard it is to type astroid at the moment, the first step is probably to do some design and breaking changes to make our lives easier first. (I'm thinking in particular of not having post-init modification on nodes in order to have less Optional everywhere, but I'm sure Daniel will have a lot of ideas about what to break exactly)
We recently opened a thread to start the discussion about it, but I wonder if it will be feasible to track the changes necessary. There are just so many changes that need to be made and tracked.
I actually have one idea for this already, which I just go during my evening running round. Let me catch up with some of my emails and then I'll work on it and create a PR.
Well that wasn't so fruitful. I tried to fix the issue of how to type NodeNG.parent
but I wasn't able to figure out a good solution. It did make me wonder whether it is at all possible. I'll create a quick issue to not lose the idea > https://github.com/PyCQA/astroid/issues/2017.
I've been thinking and astroid's tree rebuilder seems another part of the code that could work well for that. The input from the ast module is well typed at least. Also the message store and message id store in pylint which only deal with simple strings. Thoughts ?
Here's some inspiration from a earily similar situation : https://www.gauge.sh/blog/python-extensions-should-be-lazy
I was reading this article lately. With pydantic, ruff and other python package using rust to provide faster python using binding offered by https://github.com/PyO3/pyo3 . I've been wondering if we should explore the possibility to do that too.
Potential contributor to astroid are scarce, so we rejected using cython earlier (In https://github.com/PyCQA/astroid/pull/606#issuecomment-498015223 and elsewhere with hippo91 but I can't find the link right now).
However:
ruff
performance proves it's possible as a linter and also that it's a massive boost (10x or more) and a game changerProblem I foresee:
Maybe it's possible to use rust on the bottleneck only, i.e. on inference.