Closed jtpio closed 4 years ago
Looks like some tests are timing out.
Hi @jtpio! How come the library is hosted on your account? Was it unavailable as a distribution by the original authors? Do you have a suggestion for how we should do this? We could bring the file into this repo, maybe?
Was it unavailable as a distribution by the original authors?
It's a modified version of the original to handle unicode. Also the one published to npm is not from the original authors but from a fork.
We could bring the file into this repo, maybe
Yes I think so too. From the comment above:
Instead of having yet another murmurhash2 package on npm, one way forward could be to move the implementation to @jupyterlab/coreutils instead.
I missed that. That seems like a good idea to me. Or even including it as a local file we do not export right in the debugger package for now.
Or even including it as a local file we do not export right in the debugger package for now.
I think we can vendor it in @jupyterlab/debugger@0.3.x
yes, if we then add it to @jupyterlab/coreutils
in core for 3.0.
Well basically if no one else uses it right now, I'd prefer to keep the API surface area of @jupyterlab/coreutils
smaller rather than larger. What do you think?
Yeah at first it felt like it should be "out there", since there doesn't seem to be a variation of murmurhash2 that handles this case on npm.
But I agree that we can lower the maintenance burden by not exposing it and keep it private to the debugger package.
ok, so tests are also failing in https://github.com/jupyterlab/debugger/pull/550
Tests fixed in #553
Fixes #538.
This change switches to using the murmurhash2 implementation from: https://github.com/jtpio/murmurhash2
Before
After
The issue
Here is a report after digging into this and looking for different solutions. Posting a bit of information here so it's more convenient to refer to it in the future.
In https://github.com/jupyterlab/debugger/issues/538, we noticed that having non-ascii characters such as
€
in a code cell could be reproduced on the stable Binder (using0.3.3
).It was because the MurmurHash2 implementation in JS and the one used in
xeus-python
would not give the same result for the same input and seed.@jupyterlab/debugger
was using this implementation: https://github.com/mikolalysenko/murmurhash-js/blob/f19136e9f9c17f8cddc216ca3d44ec7c5c502f60/murmurhash2_gc.js#L14-L50, which corresponds to the version published on npm: https://www.npmjs.com/package/murmurhash-jsxeus-python
is using this implementation: https://github.com/xtensor-stack/xtl/blob/7d41f768787ee6405e3f1b1056e04f6fbd43f8cd/include/xtl/xhash.hpp#L47 Which is based on the original in C++: https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cppUsing this code snippet:
With
3339675911
as the seed.Would give:
murmurhash-js
:3511777296
xtl
:1079235191
(more info in #538)
Summary:
From here two options:
WebAssemly
This sounds attractive because it would mean using the same code on both the backend and the frontend. Reviving the initial work from https://github.com/xtensor-stack/xtl/pull/171 led to being able to use the
murmur2_x86
function in a node repl.However it started to feel a bit complicated when:
.wasm
files for the web as part of an npm package is a bit complicated. At least as of today, but it might improve over timebase64
encode it, and create a newUint8Array
to be passed toWebAssembly.initiate
in asetup
hook / function or equivalentxtl/hash
to wasm withemscripten
is in the order of ~100KB, which is quite a lot for just one functionemscripten
/ wasm tooling requires extra setupFix the existing JS implementation
The implementation from
murmurhash-js
states it is for ascii strings only: https://github.com/garycourt/murmurhash-js/blob/0197ce38bedac0e05f40b9d7152095d06db8292c/murmurhash2_gc.js#L9Instead, we can convert the string to a
Uint8Array
via aTextEncoder
, and perform the operations on theUint8Array
just like the original algorithm and the one used inxeus-python
. This is implemented in https://github.com/jtpio/murmurhash2Additional thoughts
Instead of having yet another
murmurhash2
package on npm, one way forward could be to move the implementation to@jupyterlab/coreutils
instead.The
0.3.x
releases of the debugger extension (this repo) could still depend onmurmur2
.