rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.8k stars 1.55k forks source link

I hope there could be an official tool used to transfer python code to rust code automatically? #3126

Open dbsxdbsx opened 3 years ago

dbsxdbsx commented 3 years ago

(I am not pretty sure if the below suggestion could be treated as a RFC, but I think it is useful for rust development.)

We know Rust aims to be a stable and productive programming language. But at present, its ecosystem lacks a lot of wheels. Meanwhile, Python is on the very opposite position. Therefore, I think it would make Rust more productive if Rust could absorb Python's ecosystem(many mature wheels) as soon as possible.

I know there are many people trying to make wheels to enrich Rust ecosystem, some for python users. But there are still some problems for people to transfer Python code to Rust. Let's take the popular Python module numpy as an example:

  1. indeed, there are some crates try to simulate numpy module. BUT, people would not be confident enough to use them as they are not official supported by numpy team.
  2. The time: it would take quite a long time to make them mature enough as they are in Python, even the corresponding Rust crates are supported officially by original python module team?
  3. No ganrantee on support for teams those built many popular Python modules to make corresponding crates in Rust.

So, I am asking if it is possible to build a tool like Rustfmt OFFICIALLY by Rust team, with witch could be used to automatically tranpile any Python script code(including module used in script) into corresponding Rust code, like pyrs does?

With such a tool:

  1. it is possible to program in Python for checking logic first, then transpile it automatically to Rust to make the whole programming routine efficient enough--- this routine, at least I think is the best way to do programming constructively and productively.
  2. people would be confident enough to use it as it is officialy supported.
  3. This saves people a lot of time, instead of spending time on building crates that will be unstable for quite a long time.

Someone would ask why I stick to only Python?

  1. Python is the most popular used language with the best ecosystem, the ecosystem feature is exactly Rust lacks MOST.
  2. For those compiled languge, like C++, there is always a way to use wheels from their field through FFI, let alone some crates to do it , like cxx. But Python is an interpreted language with an official interpreter contains GIL,even with some crates, like pyo3, it still makes Rust program not efficient enough due to GIL.
Lokathor commented 3 years ago

This is effectively impossible to do automatically.

However, if you think that it can be done, start doing it yourself. Any sufficiently complete open source tool can be adopted by the rust project later on if it works out (as long as it uses the same license as the rest of the rust project: Apache2/MIT)

SOF3 commented 3 years ago

Nothing stops you from creating an FFI interface or an RMI service such that a Rust program can communicate with a Python runtime. You don't have to RewriteItInRust or TranspileItToRust to use Python libraries in Rust.

dbsxdbsx commented 3 years ago

@Lokathor, what do you mean by "effectively" impossible? Actually, I post this suggestion here, in that I think this tool (if it is not technically impossible) would be more meaningful to be maintained by an official team--- so that more people would know it and then contribute it, and it would be easier to communicate with python team if there are technique issues need to solve for bridging these 2 languages.

@SOF3,As what I've stated, the Python runtime has a GIL, so it is not a good option in multi-thread programming in rust with python module, and I don't think it a good idea to deploy a rust-based programme with an python interpreter. Othewise, I would use Pyo3.

Lokathor commented 3 years ago

I mean that the difficulty of the converter project would be so high that it would be actually easier to just write each Python program from scratch in Rust, and you'll have useful results faster that way.

dbsxdbsx commented 3 years ago

@Lokathor, Indeed, I've already felt difficult to do the auto-transpiling job before posting this "issue", when I realize the difference between these 2 languages, like the class, attributes, decoration concepts those only exist in python. But I still think doing so (auto-tanspiling) is more cost-effective than writing every python module in rust from scratch---I mean it does not cost-effective if only seldom python modules needs to be rewritten, but it does cost effective when lots of python modules needs to be rewritten.

By the way, I am also aspired by work from @hegza, relating to this issue.

Anyway, if most of members in rust team think this suggestion is too naive and unpractical, please close this issue.

burdges commented 3 years ago

NumPy is 35% C. SciPy is 48% C, C++, and Fortran. All this code either links to or replicates older C, C++, and Fortran system libraries which come with typical Linux distributions. If you need those C, C++, or Fortran dependencies then use them directly from Rust.

Any transpiler output requires manual reworking before being useful and a complete rewrite before being idiomatic enough for a public interface. There do exist languages specifically designed around a transpiler-like phase, like doge or scala, but even they only transpile behind the scenes and never produce idiomatic code.

If one first accepts that transpilation means a computer assisted human rewrite, then one could develop independently useful interactive refactoring aids for VSCode that simplify some tasks, like choosing among borrows ala T, Box<T>, Gc<T>, RefCell<T>, Arc<Mutex<T>>, etc. or [T], [T; N], Vec<T>, ArrayVec<T,N>, Arc<[T]>, GcVec<T>, etc., or maybe do deeply cross linked unit tests and fuzzing.

michaelb commented 3 years ago

@dbsxdbsx The tool you mentionned is useful and all, but it's more like a 'google translate' doing word-by-word work. It will not produce as good a translation as a human, and that just won't cut it especially for Rust. That's why a developer has to go fix (and check!) the translated code in the linked project.

Some (most) python programs will never be 'automagically' translated to Rust because their core design relies on things that can't work in Rust, for safety or ownership reasons, or just that the concept (multi-inheritance for example) has no equivalent.

It's not that you can't write some types of algorithms in Rust. You can. But it sometimes requires completely different design choices, which an 'automagic' py2rust program will never be able to guess.

SOF3 commented 3 years ago

@dbsxdbsx Python code translated to Rust would almost never be efficient, especially when high-level concepts like inheritance are involved. The performance of Rust largely comes from code written idiomatically for it, e.g. using enums instead of dynamic dispatch. And as @michaelb mentioned, there are some intrinsically incompatible things that cannot be transpiled. For example, most OOP designs (not only Python, but also Java etc.) involve cyclic dependencies, and this is a pattern that is almost impossible in Rust. Furthermore, nothing stops you from connecting to multiple python runtimes if you are concerned about multithreading.

scottmcm commented 3 years ago

I'll note that it's substantially easier to do C→Rust, but the projects for that aren't official either:

I agree with @Lokathor here -- there's nothing about this project I can see that requires that it be official. That's especially true for Rust, given that core things used pervasively (like regex and serde) aren't "OFFICIALLY by Rust team" either.

DoumanAsh commented 3 years ago

There is no need for this shit

jhpratt commented 3 years ago

@DoumanAsh Please be respectful.

adsharma commented 3 years ago

https://github.com/adsharma/py2many is based on pyrs, comes with more included tests, improved type inference and more modular (AST rewriters, API plugins etc).

It's true that you can't transpile every python program and it's a non goal. We're trying to document the supported subset here:

https://github.com/adsharma/py2many/blob/main/doc/langspec.md

About to make a 0.3 release in the next few days. Available from pypi.

mardab commented 2 years ago

You have my attention.

As you can see, fully automatic conversion is near-impossible, but there are things you can do. Take a look at RustPython and PyO3, the 2 approaches to running Python under Rust and see what you can do yourself, since there is no RFC that stops you from writing a transpiler crate.

With GIL, however, there's a problem, since it is inherent to CPython and all those mature packages written for it. PyPy avoids this problem with it's own fork of numpy, that (among other things) does not call C functions underneath.

So, to put it bluntly, if you want numpy in Rust, call its libraries directly throgh FFI, though it is unsafe, so might as well work on RIIR, I'm sure there's crate for that in the works already.

dbsxdbsx commented 2 years ago

@mardab, thanks for your reply.Actually, I've known PyO3 and RustPython before posting this topic. And if RustPython or PyPy would be treated as official python intepreter, there would be no GIL issue.

adsharma commented 2 years ago

I want to re-examine the thesis of this RFC

it would make Rust more productive if Rust could absorb Python's ecosystem(many mature wheels) as soon as possible.

Claim: 100% python compatibility with support for code written to GIL isn't going to make rust more productive.

Instead, supporting a large enough subset of python to write real world programs and reuse much of the python ecosystem libraries is what's going to be interesting.

This is what makes py2many different from pypy, cython, nuitka and mypyc. Would love to hear more about the language spec:

https://github.com/adsharma/py2many/blob/main/doc/langspec.md

In order for this vision to work, python ecosystem also needs to change:

py2many.py --rust=1 --extension test.py

has experimental support for generating a PyO3 extension.

Please chime in here if you have more feedback on the language spec: https://github.com/adsharma/py2many/issues/205

mardab commented 2 years ago

Thing is, this RFC poses conflicting goals, "ecosystem absorption" requires full CPython compatibility, which brings GIL with it, or limiting yourself to packages that either don't use C libraries underneath or were adapted to PyPy, which has its own share of difficulties ahead.

If you need major changes to Python ecosystem to support Python ecosystem, then what's the point? Best you could do in my opinion is to write a program in Rust, and add some form of Python scripting over it, but you don't need a new RFC for that.

adsharma commented 2 years ago

Suggestion: modify to "ecosystem absorption to the extent possible" and add "leverage tooling".

For example, rust already has a json parser. Doesn't need a json parser written in python to be wrapped.

then what's the point?

I find it easier to write and debug:

https://github.com/adsharma/py2many/blob/main/tests/cases/fib_with_argparse.py

than

https://github.com/adsharma/py2many/blob/main/tests/expected/fib_with_argparse.rs

But I can see why someone might prefer to write it all in rust and not cross the python -> rust abstraction boundary. In fact, every language community I've interacted with (Julia, Nim come to mind), express the same opinion (that eventually more code will be written in $mylang than python)

Having a choice of being able to program without worrying about the borrow checker, memory allocation and leveraging the tooling (pdb, the python debugger) could be win-win for both communities.

scottmcm commented 2 years ago

Having a choice of being able to program without worrying about the borrow checker, memory allocation [...]

I feel like a bunch of this thread is attributing far more magic to rust than it really has. A bunch of the restrictions are important to why rust can be as fast as it is. If one were to translate $somelang to Rust in such a way that every variable is boxed, every method call is a hashtable lookup, every collection is thread-safe, etc, then I suspect that translated code would actually be slower than the original language could run it.

adsharma commented 2 years ago

in such a way that every variable is boxed

There is no boxing going on here: https://github.com/adsharma/py2many/blob/main/tests/expected/rect.rs

It's as efficient as a hand coded rust version as far as I can see. Boxing needed only when data is allocated in one place and ownership is transferred across procedure boundaries. Many simple cases can be optimized away based on static analysis.

every method call is a hashtable lookup

py2many uses static dispatch. It means dropping some compatibility. But I think it's a good trade-off.

every collection is thread-safe

I haven't explored thread-safety much in the project. Happy to deviate from python's GIL where it makes sense.

In the end, you're writing rust with python syntax and a faster edit-compile-debug cycle.

BoxyUwU commented 2 years ago

inline python is a pretty cool rust thing which can run python code, its not the same as what you're asking for but is a neat crate

hegza commented 2 years ago

Hi, one of the referenced authors here; PhD researcher on Rust for embedded software at Tampere University.

Approx. 2 years ago, I got presented with... about 600 lines of differential equations in Python & NumPy. Being known as the local Rustacean, they asked me if I could somehow easily check how much performance could be extracted if it was more directly mapped to hardware. I did the only sensible thing (/sarc) and made ~5 regexes that converts the NumPy Python into ndarray Rust with rayon. The rest was easy with rustc/cargo. This achieved the original goal. Might as well have used a proper Python optimizer but I wanted to try out my cool idea :D I A/B tested the approach with respect to pyrs, another tool mentioned here and compared to another project as well. I wrote a paper on that for a conference (paywalled link 😢 https://link.springer.com/chapter/10.1007/978-3-030-60939-9_9).

My current view on this is that transpiling Python with libraries is tough and sometimes outright impractical (classes, Python dynamism), but there's a real though somewhat niche use case. I sometimes legit get sent hundreds of lines of NumPy Python in the form of a research prototype, since mathematicians are often most familiar with Python (at least where I'm from). As mentioned before on this thread, Python usually runs on C that has an FFI to Python which could as well be an FFI to Rust, so that kind of, sometimes, takes care of mapping the libraries in practice. Conditional, but no boxing required.

Having worked a bit with clippy, I noticed that the Rust ecosystem would enable me to somewhat easily map the syntax tree of Python into Rust source code. Rust libraries tend to be more stably inter-compatible than Python libraries. I made a prototype https://github.com/hegza/serpent-rs. It might work, but I'd recommend working with a language that is more specifically suited for working with parsing and mapping such as Haskell.

Let me just say also that https://github.com/adsharma/py2many might just do all this better than I could ever hope to. From what I can gather, @adsharma seems to be on the right track, and I liked his defense of the subject 😄

One cool thing that would be pretty easily doable would be to publish a web demo that translates written Python into somewhat matching Rust in real-time, side-by-side. For pedagogy. Akin to https://jrvidal.github.io/explaine.rs/ My supervisor keeps encouraging me to do that but I have other plans 😃

On this Github issue / proposal: no. On py2many: yes. At least in my lab we need C and Rust, but most of our use cases start as Python. Python to Rust is real and we'd like to have it, but I would totally get it if someone would argue it's impractical.

adsharma commented 2 years ago

publish a web demo that translates written Python into somewhat matching Rust in real-time, side-by-side.

http://transpyle.me/ is something I put up a few weeks ago. I would love to move it to a box with more resources and find a way to sustain itself (either through donations or ad supported).

takkuumi commented 2 years ago

My suggestion is to stop writing python now, start writing new functions in the rust language, and use a certain protocol (tcp, socket or rpc) to allow the two programs to interact. Finally, use the rust language to iteratively migrate the old functions (python).

hegza commented 2 years ago

My suggestion is to stop writing python now, start writing new functions in the rust language, and use a certain protocol (tcp, socket or rpc) to allow the two programs to interact. Finally, use the rust language to iteratively migrate the old functions (python).

I'll go with the generous interpretation of "if you need to use Rust, avoid writing Python and interface with it instead".

That would still really depend on what domain and project you're working with. I'm an everyday user of both of the languages and there are certainly things I'd rather keep writing in Python than in Rust, as there are also things I'd rather write in Rust than in Python.

takkuumi commented 2 years ago

My suggestion is to stop writing python now, start writing new functions in the rust language, and use a certain protocol (tcp, socket or rpc) to allow the two programs to interact. Finally, use the rust language to iteratively migrate the old functions (python).

I'll go with the generous interpretation of "if you need to use Rust, avoid writing Python and interface with it instead".

That would still really depend on what domain and project you're working with. I'm an everyday user of both of the languages and there are certainly things I'd rather keep writing in Python than in Rust, as there are also things I'd rather write in Rust than in Python.

Consider the following questions:

  1. Why do you need to convert python code to rust?
  2. What features of rust impress you?

I just make a suggestion. keep the original python program running.,Use the rust language to develop new program, make them work together by a protocol . rewrite functions by rust language if it's necessary.

Of course, you can continue to use python to develop programs and use python tool-chain. Also you can use rust to develop programs and use rust tool-chain. Hybrid programming exists in many large projects. The trouble is that you need to use a protocol (like socket, rpc...) to pass messages.

adsharma commented 2 years ago

python and rust live on opposite ends of the "abstract and easy to program" vs "detail oriented and safe" spectrum. There is a market for both.

The trouble is that you need to use a protocol

The idea is to transpile python to rust, so you only have rust code to deal with. No sockets for crossing language boundary. Some rust code is hand written, other rust code is generated by a transpiler.

There is some justifiable skepticism, given that there are very few successful transpiler projects out there. Suggest trying the various transpilers out before forming an opinion.

adsharma commented 2 years ago

Minor usability improvement: py2many now spits out Cargo.toml when you transpile a whole directory worth of files.

https://github.com/adsharma/py2many/pull/453

insinfo commented 2 years ago

@adsharma you are doing an amazing job