python-hyper / hpack

HTTP/2 Header Encoding for Python
https://hpack.readthedocs.io/en/latest/
MIT License
73 stars 32 forks source link

Stateless Huffman #8

Open jimcarreer opened 9 years ago

jimcarreer commented 9 years ago

As discussed with @Lukasa the Huffman encoder / decoder is currently initialized using two static structures defined in huffman_constants.py. We could potentially remove the need for this pseudo-stateful nature and instead have a completely stateless Huffman encoder / decoder with the addition of another static structure related to decoding (encoding is currently trivial to make stateless). This may also offer a minor performance enhancement.

jimcarreer commented 8 years ago

I think I'm going to take a look at this, this weekend.

Lukasa commented 8 years ago

An FYI, I've migrated to an essentially stateless Huffman decoding implementation in #35.

thedrow commented 7 years ago

So this should be closed?

Lukasa commented 7 years ago

Nope, we still have a stateful encoder.

thedrow commented 7 years ago

I can try to write it in C if that helps.

Lukasa commented 7 years ago

I would like to avoid us having any native extensions in this module at this time.

thedrow commented 7 years ago

Even optionally?

Lukasa commented 7 years ago

Tentatively, yes, I'm opposed to having an optional native extension.

My thinking on this is a bit complex, so let me lay it out.

  1. Generally speaking I have a policy regarding C code, which is that wherever possible I'd like to avoid having untrusted input reach C code. The risks of running C code on untrusted input are too high, and I find that generally speaking I and most software developers I work with are bad at spotting places where assumptions are made that data will be "reasonable". That makes C a great vector for bugs, and in particular for the kind of bug that does not affect a pure-Python program.

    For that reason, if we were going to write a native extension I'd want it to be in Rust, not C. However,

  2. The wheel ecosystem is not quite there yet. While it's now mostly possible to ship binary wheels for most platforms, there are a number of Linux distributions that cannot use manylinux1 wheels. This is a problem if you want to ship a Rust extension, because it's very difficult to indicate the need for a Rust compile chain on a random Linux system.

    This problem also affects C extensions, by the way: one of the systems that doesn't support a manylinux1 wheel is Alpine, which uses musl libc. That means that if we shipped a C extension on that platform, we'd require the installation of a C toolchain for users to install the module: not great.

  3. So the only option is, as you said, to make the module optional: we'd ideally want to check whether we could compile it and, if we couldn't, we'd simply fall back to pure-Python code.

    I have extremely mixed feelings about this, most of them not good. In particular, it's a bit of a hairy debugging issue: what code the user is executing becomes conditional on difficult-to-introspect features of their platform; we double the number of code paths that need to be tested and debugged; and we double the workload for feature addition and API changes.

The TL;DR here, I think, is that right now I'm about -0.25 on having a native extension. If we could come up with a nice Rust extension then I'd reconsider, but I'd really rather not maintain a C extension if it can possibly be avoided.

Lukasa commented 7 years ago

In fact, let state an outline of what we'd need to make me happy here.

If any contributor can write a Rust extension that can be optionally compiled if a buildchain is present, that flags its presence or absence clearly in a module-scoped dunder variable, that uses CFFI, and that doesn't vomit a load of output into the terminal on install on systems that do not have a Rust toolchain present, I'd be willing to consider merging such an extension.

thedrow commented 7 years ago

It shouldn't be much of a problem with https://github.com/novocaine/rust-python-ext

Lukasa commented 7 years ago

I've already written a Rust extension module in Python, that's not the hard bit. The harder bits are a) making it fail gracefully, b) keeping it maintained, and c) actually writing the extension. ;)

thedrow commented 7 years ago

Right, this might take some effort since I'm not very familiar with Rust's lifelines and borrow semantics. I will give it a shot though.