Closed belm0 closed 5 years ago
Judging from the error message, I don't think this has anything to do with autobahn... that issue you linked to just notes that wsaccel is broken on pypy, as part of the justification for autobahn switching away from using wsaccel.
It looks like there are two problems:
For some reason utf8validator.c
and xormask.c
both try to basically monkeypatch PyMethod_New
, by replacing it with a wrapper macro. In the mean time, PyPy's C API compatibility stuff has already replaced PyMethod_New
with a macro, so the two macros conflict. This is a super hacky-looking thing for wsaccel to be doing, so the fix is probably to fix it so it stops doing that.
For some reason xormask.c
wants to mess with the current traceback (tstate->curexc_traceback
). Apparently pypy doesn't currently implement that. This seems like a strange thing for some XOR-masking code to be doing.
In the short term, I think the solution is just...... don't use wsaccel? wsproto works fine without wsaccel; it just falls back on a pure-python implementation of XOR-masking. Which on PyPy might work fine anyway, so this might also be the long-term solution :-). In fact the autobahn source code claims that their pure-Python XOR masking code is faster on PyPy than wsaccel's "optimized" code. Their code is significantly fancier than wsproto's fallback code, so I don't know how fast that goes on PyPy.
Recommendation: tweak trio-websocket's install_requires
so that it only installs wsaccel when running on CPython. Something like: "wsaccel >=0.6.2, <0.7; implementation_name == 'cpython'"
FWIW, last time I checked my app with and without wsaccel I didn't discover a difference: https://github.com/HyperionGray/trio-websocket/issues/32#issuecomment-423180088
I can't believe that an apparently little-used, unmaintained package is still relevant to Python ws performance.
I guess you're only dealing with small messages? The websocket protocol requires XOR-masking on every client→server packet, and if you're doing it in Python then this means looping over bytes, which has absurdly high overhead on CPython. On my laptop with 10 KB messages, wsproto's Python fallback burns ~1 ms of CPU time per message just for masking, but with wsaccel this drops to 17 µs. That's a ~60x speedup. Or put another way, without wsaccel, a fast modern CPU can't hope to push more than ~10 MB/s of websocket traffic, and that's if it's doing nothing else at all.
If we don't want a little-used, unmaintained package to be relevant to Python ws performance, then we need to like, maintain it or make a replacement or something :-). It won't magically become irrelevant on its own.
Did a bit of followup reading... I guess part of why wsaccel is somewhat unmaintained is that it's kinda slow and projects like aiohttp and tornado have switched to their own private speedup modules:
That graph's from this blog post, which shows how to at least get the orange and green lines using pure Python. So wsproto could get surprisingly close to wsaccel speed using pure Python; or if it used a better-tuned native extension then it could go roughly one order of magnitude faster than that.
Native extensions are a major hassle to maintain (you have to make wheels etc.), so I'm not sure if it's worth bothering. If we do then tornado's version seems simple and clear:
https://github.com/tornadoweb/tornado/blob/master/tornado/speedups.c
If I were doing it for real I'd probably use cffi to wrap that core logic. And maybe follow chromium in adding a version that follows the same pattern but uses int __attribute__((vector_size (16)))
, for a modest SIMD speedup. (chromium code, docs for __attribute__((vector_size))
)
Thank you-- I'll have to double check wsaccel on my app. I'll also file the suggestion to wsproto.
Native extensions are a major hassle to maintain (you have to make wheels etc.), so I'm not sure if it's worth bothering.
I'd consider having an optional numba implementation for the CPython case.
Numba pulls in numpy and llvm. Altogether it's about 40 MB download, 164 MB on disk. This seems excessive for a single 10 line C function.
Actually numpy on its own can probably handle the masking pretty well, but I don't think people would want to pull in numpy just for this.
Altogether it's about 40 MB download, 164 MB on disk
Thank you, hadn't considered that. Where I've used numba before the project already had the dependencies.
~I believe the wsaccel dependency pulls in autobahn, which in turn fails under PyPy due to https://github.com/crossbario/autobahn-python/issues/968.~
The wsaccel dependency fails under PyPy: