indygreg / python-zstandard

Python bindings to the Zstandard (zstd) compression library
BSD 3-Clause "New" or "Revised" License
481 stars 83 forks source link

Intent to rewrite in Rust #110

Open indygreg opened 4 years ago

indygreg commented 4 years ago

After learning Rust and seeing its obvious benefits for safety/security and ease of maintenance over C, I would prefer to not author any new C code wherever possible.

I believe implementing this Python extension in Rust instead of C has long-term benefits to the maintainability of the project.

Once #109 lands and Python 2.7 is no longer supported, my intent is to start rewriting C code in this project in Rust. (Getting Rust to work with Python 2.7 on Windows is a massive plain and effectively blocks writing Python extensions in Rust.)

I'm filing this issue toprovide a sounding board for people to raise concerns/support for this initiative.

mgorny commented 4 years ago

I hereby raise concern that Rust is PITA to many people, starting with the fact that it isn't supported on many architectures (coverage is much smaller than zstandard use on Gentoo right now), and ending with the fact that its upstream still haven't managed to reach an agreement with LLVM upstream and instead keep using outdated forks of LLVM.

indygreg commented 4 years ago

I'm working under the assumption that most people will install this Python package via pip install zstandard. And the overwhelming majority of those people will install a pre-built binary wheel published in PyPI (https://pypi.org/project/zstandard/#files). People installing Python binary wheels will have a pre-built shared library / extension and Rust doesn't enter the picture at all.

The only people that need to be burdened by the existence of Rust are:

  1. Developers of this project
  2. Packagers of this project
  3. People who can't install a pre-built binary wheel (perhaps they are running a more esoteric OS/platform)

1 is basically just me.

I'm very empathetic towards end-users who can't install pre-built binary wheels. For them, I'll need to make the code in setup.py as solid as possible. Or maybe we give up on Rust and fall back to the CFFI backend if Rust isn't "obviously" available.

I have mixed feelings towards packagers. I want to support them fully. My assumption is the state of Rust packaging is solid enough in 2020 that this shouldn't be a problem. (Firefox is using Rust heavily and nearly everyone can package Firefox...) But packagers irk me in that they often make their job 10x harder by imposing arbitrary rules on themselves. So I anticipate packagers potentially taking issue with this change because it makes their lives harder. I dunno.

Do you have a specific objection, @mgorny?

mgorny commented 4 years ago

You're entirely missing group 4., users of source-based distributions that get all their packages built locally.

As for binary wheels, I have very bad experience with them on non-x86 architectures. I can't imagine anyone using binary wheels, say, on ARM where you have to support a lot of different processors. Yeah, building C without a second machine to cross-compile is not easy but it's possible. I've seen pip install packages (I think it was on SPARC) that segfaulted right away — and I honestly doubt somebody built binary packages for that architecture, so either pip installed something wrong entirely or failed to build it properly from source.

mgorny commented 4 years ago

Oh, and CFFI would certainly be a better solution. Probably fast enough too.

ghost commented 3 years ago

IMHO this project has a high degree of completion, if make some changes, maybe it can be included into Python's stdlib, which is written by C.

Some people hope that Python's stdlib includes a zstd module, see issue37095. (I'm not CPython's core developer)

its obvious benefits for safety/security and ease of maintenance over C

IMHO, In the long run, this project will be stable at some point, then these will not be a problem.

indygreg commented 3 years ago

The main branch now has a Rust backend passing all tests! You can build + test it by doing something like:

python setup.py --rust-backend build_ext -i
PYTHON_ZSTANDARD_IMPORT_POLICY=rust pytest

There are currently a number of caveats.

The Rust backend was mostly implemented using the CFFI backend as a source, so certain patterns from the C backend may not be preserved. This includes releasing of the GIL.

The Rust backend doesn't yet perform 0-copy in many call sites, so performance is likely slower than the C and even CFFI backends.

There are intermittent fuzzing test failures in CI due to timeouts (possibly due to current performance overhead of Rust backend).

The multi_compress_to_buffer and multi_decompress_to_buffer implementations are grossly inefficient and likely much slower than the C implementation.

We do not yet include the Rust backend in built wheels.

There hasn't been an extensive auditing of the code: there are likely bugs lingering.

Before I call the Rust backend stable, I plan to extensively audit the Rust code against the C code and fix any performance/correctness gaps.

mgorny commented 3 years ago

So we've been revisiting Rust recently because of cryptography, and it's still a horrible choice. It's still far from mature, officially supports only x86 and arm64. A few of the platforms that python-zstandard currently runs on are not supported at all (and would require tons of work which the few volunteers simply can't afford to do), the majority that are unofficially supported are very hard to bootstrap and upstream simply ignores requests to help with that.

So we'd really appreciate some formal backup plan for us. Something better than 'remove python-zstandard from most of your supported platforms' or 'keep patching the old version yourself'.