paulchernoch / hilbert

Hilbert Transformation and inverse for Rust
MIT License
72 stars 7 forks source link

This crate is in dire need of refactoring. (which has already been done elsewhere) #4

Open DoubleHyphen opened 2 years ago

DoubleHyphen commented 2 years ago

Dear sir:

First things first, allow me to thank you for taking the time to check out this repository. Seeing as you've taken interest in it again, allow me to present you with some problems with your crate which I believe to be significant.

  1. You're implementing a Point data-structure. This is a complete re-invention of the wheel; the nalgebra crate already exists, and the Points it defines are much more featureful and robustly implemented.
  2. Your Hilbert-encoding and -decoding methods only accept u32s. This has no reason to be like this: it's entirely possible to make them accept any primitive unsigned integer. (I'll admit it's much more of a pain than it needs to be, though…)
  3. Instead of accepting arrays as your input, you accept Vecs. This opens up the possibility that someone might include, in the same collection, points that contain different numbers of dimensions! You could use arrays instead, for a triple benefit: a) They eliminate the possibility of mis-use, b) they're faster, and c) they also allow you to statically know the amount of bits you'll need for your result. The last point also means that…
  4. …it's frequently possible to output primitive integers (u128s, u64s) and so on, rather than relying on BigUints all the time. This is faster and allows you to save memory. If you have eg [u32; 3] then you can just output a u128; if you have an [u128; 42] then you can use a BigUint of course.
  5. The thing that you call “interleaving” is essentially just Morton encoding aka Z-indexing, isn't it? You don't need to re-implement it, I already have a library for it. (Which I'll need to refactor soon-ish.)
  6. You are accepting a bits parametre that denotes how many bits the algorithm should examine. This makes mis-use possible, but is also not really needed. The leading_zeros method can tell you how many zeros an integer begins with; for the maximum in an array, you can bit-or them all together and leading_zeros the result; and from that you can skip any leading zeros that come in groups of D and arrive at the same result as if you'd examined them all. Thus, you only ever examine at most D-1 useless bits for each co-ordinate.

There is also some advice I'd like to give you for your future Rust projects:

  1. You are manually unrolling your loops. This does not, in fact, speed up anything: the benchmarks I'd run showed me that the functional style is the fastest, and IIRC that manually-unrolled loops are significantly slower than even naïve (“rolled”) loops.
  2. If you're only using a dependency (such as rand or criterion or spectral) for testing, you can include it in your Cargo.toml under the dev_dependencies category. That way, your users won't have to download the same dependencies just to use your library.

Now, you might be wondering if it's worth it to go through all that effort to refactor your crate. Except, the work has already been done. Would you like to incorporate any of that work into your own crate? It does lead to, approximately, a 5× speedup compared to your code, and it has hugely increased ergonomics. It admittedly would lead to an API break, but… your crate's version is still 0.*, so you wouldn't even be breaking semver.

Kind regards.

PS1: The fast_hilbert crate has also done some good work you could take a look in. PS2: To be fair, the ability to treat arrays with any kind of dimensional generality didn't exist yet when you wrote this crate.

DoubleHyphen commented 7 months ago

It's been a smidge over two years already. Any news on this?