pylint-dev / astroid

A common base representation of python source code for pylint and other projects
https://pylint.readthedocs.io/projects/astroid/en/latest/
GNU Lesser General Public License v2.1
529 stars 275 forks source link

[discussion] Use a rust crate to make astroid faster, or part of astroid faster #2014

Open Pierre-Sassoulas opened 1 year ago

Pierre-Sassoulas commented 1 year ago

I was reading this article lately. With pydantic, ruff and other python package using rust to provide faster python using binding offered by https://github.com/PyO3/pyo3 . I've been wondering if we should explore the possibility to do that too.

Potential contributor to astroid are scarce, so we rejected using cython earlier (In https://github.com/PyCQA/astroid/pull/606#issuecomment-498015223 and elsewhere with hippo91 but I can't find the link right now).

However:

Problem I foresee:

Maybe it's possible to use rust on the bottleneck only, i.e. on inference.

tushar-deepsource commented 1 year ago

A few notes:

The major problem I see is that if you move away from Python's ast module, you gain a massive burden of maintaining whatever AST library that you use, be it RustPython or something else, everytime a user finds a bug or Python has a syntax changing update.

DanielNoord commented 1 year ago

I think the main issue with contributing to astroid is the state of the codebase. It's just very unclear how everything works. Any attempt to fix that is met with "breaking changes" concerns. I would love to spend some time just breaking full API compatibility within astroid and make sure all typing of the nodes is correct. That would already help so much, but to do so we would need to make a conscious decision to break backwards compatibility with a 3.0 cut off.

Pierre-Sassoulas commented 1 year ago

I think it's a given that breaking the API will need to be done:

But what I want to discuss here is:

DanielNoord commented 1 year ago

I don't think rust will help much personally.

At the moment I think it will only limit the number of potential contributors (venn diagram of people interested in helping pylint and those that know rust is small I think). I do think it would be interesting to do some of the endless recursion in rust at some point when we have better interfaces for it in astroid that can easily switch between rust and python.

tusharsadhwani commented 1 year ago

One way that I see rust working out is by moving the intensive logic into rust. That worked for libcst very well in my experience.

Personally, I think moving the inference engine to rust is something to think about. i.e. things like astroid_cache, InferenceContext etc. are stored in rust, and every .infer() call will be running code in rust-land, just returning the result.

By doing so, not only are you moving the complex, highly recursive logic to a type safe, memory safe language that's easier to maintain, you're also probably speeding up inference many times over and solving recursion limit bugs like https://github.com/PyCQA/pylint/issues/7898

Proper separation of code and being able to rewrite inference to a cleaner codebase is added bonus.

DanielNoord commented 1 year ago

Totally agree, but until we have a modern codebase I don't think we have a very good idea of what that area would be. Clearly it is "inference" but in general an inference call isn't that problematic. Many of the performance issues within pylint come from start up and the constant inference of the full stdlib.

I really don't want to squash anybody's hopes of doing this because I am in favour, but I think we should first have a much better design in order to, for example, not end up with the globally shared caching issues like we do now.

nickdrozd commented 1 year ago

Something else to consider is compiling with Mypyc along the lines of https://ichard26.github.io/blog/2022/05/compiling-black-with-mypyc-part-1/. Obviously doing that will require getting the codebase correctly typed. Correct typing would also likely be a preliminary step for fitting in a Rust module.

However, the existing codebase is manifestly type-incorrect, and that type-incorrect behavior is guaranteed as part of the API in perpetuity. Fixing the type errors is impossible, since the type errors are considered correct by definition.

So I agree with @DanielNoord that development should be moved towards version 3, and sooner rather than later. Is there any reason not to do it?

Pierre-Sassoulas commented 1 year ago

Obviously doing that will require getting the codebase correctly typed. Correct typing would also likely be a preliminary step for fitting in a Rust module.

Definitely. And seeing how hard it is to type astroid at the moment, the first step is probably to do some design and breaking changes to make our lives easier first. (I'm thinking in particular of not having post-init modification on nodes in order to have less Optional everywhere, but I'm sure Daniel will have a lot of ideas about what to break exactly)

DanielNoord commented 1 year ago

We recently opened a thread to start the discussion about it, but I wonder if it will be feasible to track the changes necessary. There are just so many changes that need to be made and tracked.

DanielNoord commented 1 year ago

I actually have one idea for this already, which I just go during my evening running round. Let me catch up with some of my emails and then I'll work on it and create a PR.

DanielNoord commented 1 year ago

Well that wasn't so fruitful. I tried to fix the issue of how to type NodeNG.parent but I wasn't able to figure out a good solution. It did make me wonder whether it is at all possible. I'll create a quick issue to not lose the idea > https://github.com/PyCQA/astroid/issues/2017.

Pierre-Sassoulas commented 1 year ago

I've been thinking and astroid's tree rebuilder seems another part of the code that could work well for that. The input from the ast module is well typed at least. Also the message store and message id store in pylint which only deal with simple strings. Thoughts ?

Pierre-Sassoulas commented 2 months ago

Here's some inspiration from a earily similar situation : https://www.gauge.sh/blog/python-extensions-should-be-lazy