[discussion] Use a rust crate to make astroid faster, or part of astroid faster

Pierre-Sassoulas commented 1 year ago

I was reading this article lately. With pydantic, ruff and other python package using rust to provide faster python using binding offered by https://github.com/PyO3/pyo3 . I've been wondering if we should explore the possibility to do that too.

Potential contributor to astroid are scarce, so we rejected using cython earlier (In https://github.com/PyCQA/astroid/pull/606#issuecomment-498015223 and elsewhere with hippo91 but I can't find the link right now).

However:

rust is well loved and has been the most loved language for a few year on stackoverflow. So we might find contributors after all ?
ruff performance proves it's possible as a linter and also that it's a massive boost (10x or more) and a game changer
astroid's code is convoluted, maybe a way to avoid painfully typing it and maintaining it ? Daniel wanted to make it internal, well, porting to rust is a way to do whatever we like ? 😄

Problem I foresee:

I don't know how astroid works, it's not a given that we're going to architecture a new package better ?
We rely on the ast module right now, I don't know if we can rely on the ast from rust. The fact that ruff do not handle pattern matching seems to indicate that no we can't.

Maybe it's possible to use rust on the bottleneck only, i.e. on inference.

tushar-deepsource commented 1 year ago

A few notes:

ruff uses RustPython, and RustPython's AST doesn't yet parse match statements. However you can instead use libcst's parser, which is heavier but supports match statements. (Although, maintenance of libcst has slowed down recently and I've found a few bugs in it myself.)
You can definitely write an astroid tree on top of those two parser's AST.

The major problem I see is that if you move away from Python's ast module, you gain a massive burden of maintaining whatever AST library that you use, be it RustPython or something else, everytime a user finds a bug or Python has a syntax changing update.

DanielNoord commented 1 year ago

I think the main issue with contributing to astroid is the state of the codebase. It's just very unclear how everything works. Any attempt to fix that is met with "breaking changes" concerns. I would love to spend some time just breaking full API compatibility within astroid and make sure all typing of the nodes is correct. That would already help so much, but to do so we would need to make a conscious decision to break backwards compatibility with a 3.0 cut off.

Pierre-Sassoulas commented 1 year ago

I think it's a given that breaking the API will need to be done:

working on astroid is really painful right now
there's been 20 year of python development and astroid had to follow along and keep being a wrapper between the ast and pylint. We do not need to support the ast's API from python 3.6 at this point, much less python 2.5. If we rewrite astroid's API based on python 3.11's ast API, it's going to be clearer for everyone. And we'd be set for being a wrapper for 10 more years (until it become too painful again and we have to upgrade to python 3.42's API).
98% of astroid's user are pylint users, there's still a quarter million of astroid "pure" users, but they can use astroid 2.x until they chose to move to 3.x
If we break "everything" we do not need to warn about every single thing, we only need to document the new API

But what I want to discuss here is:

Should we use rust ?
If yes, then should we use a rust AST ? (I don't think so, because we need fine understanding of new syntax's when a python version is released and only the python ast will provide that at a low cost for us right now, as confirmed by @tushar-deepsource )
If yes then no, then what part makes the most sense to implement in rust ? (Naive answer : inference maybe some small part that never change, but I did not do any in depth investigation to confirm that)

DanielNoord commented 1 year ago

I don't think rust will help much personally.

At the moment I think it will only limit the number of potential contributors (venn diagram of people interested in helping pylint and those that know rust is small I think). I do think it would be interesting to do some of the endless recursion in rust at some point when we have better interfaces for it in astroid that can easily switch between rust and python.

tusharsadhwani commented 1 year ago

One way that I see rust working out is by moving the intensive logic into rust. That worked for libcst very well in my experience.

Personally, I think moving the inference engine to rust is something to think about. i.e. things like astroid_cache, InferenceContext etc. are stored in rust, and every .infer() call will be running code in rust-land, just returning the result.

By doing so, not only are you moving the complex, highly recursive logic to a type safe, memory safe language that's easier to maintain, you're also probably speeding up inference many times over and solving recursion limit bugs like https://github.com/PyCQA/pylint/issues/7898

Proper separation of code and being able to rewrite inference to a cleaner codebase is added bonus.

DanielNoord commented 1 year ago

Totally agree, but until we have a modern codebase I don't think we have a very good idea of what that area would be. Clearly it is "inference" but in general an inference call isn't that problematic. Many of the performance issues within pylint come from start up and the constant inference of the full stdlib.

I really don't want to squash anybody's hopes of doing this because I am in favour, but I think we should first have a much better design in order to, for example, not end up with the globally shared caching issues like we do now.

nickdrozd commented 1 year ago

Something else to consider is compiling with Mypyc along the lines of https://ichard26.github.io/blog/2022/05/compiling-black-with-mypyc-part-1/. Obviously doing that will require getting the codebase correctly typed. Correct typing would also likely be a preliminary step for fitting in a Rust module.

However, the existing codebase is manifestly type-incorrect, and that type-incorrect behavior is guaranteed as part of the API in perpetuity. Fixing the type errors is impossible, since the type errors are considered correct by definition.

So I agree with @DanielNoord that development should be moved towards version 3, and sooner rather than later. Is there any reason not to do it?

Pierre-Sassoulas commented 1 year ago

Obviously doing that will require getting the codebase correctly typed. Correct typing would also likely be a preliminary step for fitting in a Rust module.

Definitely. And seeing how hard it is to type astroid at the moment, the first step is probably to do some design and breaking changes to make our lives easier first. (I'm thinking in particular of not having post-init modification on nodes in order to have less Optional everywhere, but I'm sure Daniel will have a lot of ideas about what to break exactly)

DanielNoord commented 1 year ago

We recently opened a thread to start the discussion about it, but I wonder if it will be feasible to track the changes necessary. There are just so many changes that need to be made and tracked.

DanielNoord commented 1 year ago

I actually have one idea for this already, which I just go during my evening running round. Let me catch up with some of my emails and then I'll work on it and create a PR.

DanielNoord commented 1 year ago

Well that wasn't so fruitful. I tried to fix the issue of how to type NodeNG.parent but I wasn't able to figure out a good solution. It did make me wonder whether it is at all possible. I'll create a quick issue to not lose the idea > https://github.com/PyCQA/astroid/issues/2017.

Pierre-Sassoulas commented 1 year ago

I've been thinking and astroid's tree rebuilder seems another part of the code that could work well for that. The input from the ast module is well typed at least. Also the message store and message id store in pylint which only deal with simple strings. Thoughts ?

Pierre-Sassoulas commented 2 months ago

Here's some inspiration from a earily similar situation : https://www.gauge.sh/blog/python-extensions-should-be-lazy

pylint-dev / astroid

[discussion] Use a rust crate to make astroid faster, or part of astroid faster #2014