protomaps / PMTiles

Cloud-optimized + compressed single-file tile archives for vector and raster maps
https://protomaps.com/docs/pmtiles/
BSD 3-Clause "New" or "Revised" License
2.09k stars 123 forks source link

Possible reference implementation of hilbert tile ID primitives #393

Closed himikof closed 1 month ago

himikof commented 6 months ago

Thank you for your work on the specification, I think it is a great fit for many deployment scenarios!

As I was trying to make a use-case specific reader implementation, I noticed that there is a part of the specification that will likely need to be (mostly) separately implemented many times, and is not trivial to do just by following the spec: hilbert tile id encoding/decoding. I surveyed many available implementations in different languages, and many seem to be using slightly different somewhat inefficient ways to achieve this. It is easy to suggest improvements for any single implementation, but it seems that many implementations are done by adaptation and modification of the previously existing ones.

As the 2-D hilbert curve coding is both a bit specialized to be readily found in good-quality libraries in a given environment (unlike LEB128 varint coding, for example), and is a bit complex to be easily and correctly open-coded by implementers, I think that there is a lot of benefit to explicitly documenting an efficient algorithm as a specification appendix.

I've implemented a couple of mostly generic, very fast versions that use only a few features from the target language/environment, and should be easily portable to most environments with 64-bit integers[^1]. The implementation idea is mostly from the classic Hacker's Delight book, paragraphs 16-1, 16-2[^2].

What do you think about including some reference code like below as a guide-to-implementers appendix (obviously, with much more detailed explanation)? Also I think that most implementations in this repo could be improved by using these algorithms, which in most cases should be both more efficient and shorter that the previous implementations. Error handling/interface can obviously be adjusted as needed by a particular implementation.

Python version

def zxy_to_tile_id(z: int, x: int, y: int):
    if z > 31:
        raise OverflowError("tile zoom exceeds 64-bit limit")
    if x >= 1 << z or y >= 1 << z:
        raise ValueError("tile x/y outside zoom level bounds")
    prefix = ((1 << 2 * z) - 1) // 3
    state = 0
    result = 0
    for i in reversed(range(z)):
        row_2 = 8 * state | 4 * ((x >> i) & 1) | 2 * ((y >> i) & 1)
        result = (result << 2) | (0x361E9CB4 >> row_2) & 3
        state = (0x8FE65831 >> row_2) & 3
    return prefix + result

# zxy_to_tile_id(32, 0, 0)
INVALID_TILE_ID: int = 0x5555_5555_5555_5555

def tile_id_to_zxy(tile_id: int) -> typing.Tuple[int, int, int]:
    if tile_id >= INVALID_TILE_ID:
        raise OverflowError("tile zoom exceeds 64-bit limit")
    z = 0
    while (1 << 2 * (z + 1)) <= 3 * tile_id + 1:
        z += 1

    code = tile_id - ((1 << 2 * z) - 1) // 3
    state = 0
    x = y = 0

    for i in reversed(range(0, 2 * z, 2)):
        row = 4 * state | (code >> i) & 3
        x = (x << 1) | (0x936C >> row) & 1
        y = (y << 1) | (0x39C6 >> row) & 1
        state = (0x3E6B94C1 >> 2 * row) & 3

    return z, x, y

Rust version

fn zxy_to_tile_id(z: u32, x: u32, y: u32) -> Option<u64> {
    if z > 31 || x >= 1 << z || y >= 1 << z {
        return None;
    }
    let prefix: u64 = ((1 << 2 * z) - 1) / 3;
    let mut state = 0;
    let mut result: u64 = 0;
    for i in (0..z).rev() {
        let row_2 = 8 * state | 4 * ((x >> i) & 1) | 2 * ((y >> i) & 1);
        result = (result << 2) | (0x361E9CB4 >> row_2) & 3;
        state = (0x8FE65831 >> row_2) & 3;
    }
    Some(prefix + result)
}

// zxy_to_tile_id(32, 0, 0)
const INVALID_TILE_ID: u64 = 0x5555_5555_5555_5555;

fn tile_id_to_zxy(tile_id: u64) -> (u32, u32, u32) {
    assert!(tile_id < INVALID_TILE_ID);
    // 0x1555_5555_5555_5555 == zxy_to_tile_id(31, 0, 0)
    let z: u32 = if tile_id >= 0x1555_5555_5555_5555 {
        31
    } else {
        let mut z = 0;
        // C/C++ version: for (uint64_t p = 1 << 2; p <= 3 * tile_id + 1; p <<= 2)
        let mut p: u64 = 1 << 2;
        // 3 * tile_id + 1 cannot overflow here when bounded
        while p <= 3 * tile_id + 1 {
            p <<= 2;
            z += 1;
        }
        z
    };

    let code = tile_id - ((1 << 2 * z) - 1) / 3;
    let mut state = 0;
    let mut x: u32 = 0;
    let mut y: u32 = 0;

    for i in (0..2*z).step_by(2).rev() {
        let row = 4 * state | (code >> i) & 3;
        x = (x << 1) | (0x936C >> row) & 1;
        y = (y << 1) | (0x39C6 >> row) & 1;
        state = (0x3E6B94C1 >> 2 * row) & 3;
    }

    (z, x, y)
}

JavaScript specifics

I'm not an expert on JavaScript, but I think that at least the zxy_to_tile_id can be easily made to work (for the same limited range of z as the current implementation) with its limited range arithmetic:

function zxy_to_tile_id(z: number, x: number, y: number): number | null {
    if (z > 26 || x >= 1 << z || y >= 1 << z) {
        return null;
    }
    const prefix = Math.floor(((1 << z) * (1 << z) - 1) / 3);
    let state = 0;
    let result = 0;
    for (let i = z - 1; i >= 0; i--) {
        let row_2 = 8 * state | 4 * ((x >> i) & 1) | 2 * ((y >> i) & 1);
        result = result * 4 + ((0x361E9CB4 >> row_2) & 3);
        state = (0x8FE65831 >> row_2) & 3;
    }
    return prefix + result;
}

Unfortunately, I have no idea how to easily read high (52-32) bits of tile_id in order (from highest to lowest) in JS, and this makes the tile_id_to_zxy port much more complex.

[^1]: Unfortunately, that excludes JavaScript (without WASM or BigInt). [^2]: Basically the algorithms use simple finite-state transducers with just 4 states, with the transition tables of only 64 bits encoded as 2-3 small integer constants. The prefix size calculation is done with the closed-form formula for a geometric series sum.

prusswan commented 6 months ago

The fastest one I found was fast_hilbert (Rust) https://www.reddit.com/r/rust/comments/yip7j4/fast_hilbert_200_released_the_fastest_hilbert/

I am not sure if there is a single "best" method for 2D-1D conversion, but I am aware of several LUT-based approaches that all take advantage of inherent patterns in the sequence. One of which that I found to be quite readable is this: https://bertvandenbroucke.netlify.app/2019/01/18/space-filling-curves/. https://threadlocalmutex.com/?p=205 would lead to several other implementations. https://pypi.org/project/hilbertcurve/ is a more generic version that works for arbitrary orders of N (beyond 2D)

himikof commented 6 months ago

Thanks for those alternate references!

Just to be clear, the implementations I provided are what most people would call LUT-based, only the LUT is so small that it is encoded into 2-3 small integer immediate constants (avoiding any possible memory lookup dependencies). Exactly the same table as https://bertvandenbroucke.netlify.app/2019/01/18/space-filling-curves/ is used, just encoded more efficiently.

I've selected this particular implementation because of two reasons: it is reasonably fast (faster than naive/most generic implementations) and can be easily ported into different languages/environments due to small code size. Other fastest implementations usually either have much more code, or dependencies on non-portable instructions.

The approach from fast_hilbert crate is a bit more complex variation of the provided algorithm with only the following changes:

  1. It operates on batches of 6 tile id bits at once by increasing the LUT size from 64 bits to 2048 bits.
  2. It contains a cool optimization that stops the loop as soon as higher input bits are all zero, making numerically small x/y values convert faster. Not sure it this would be significant in PMTiles with global coverage.

The prefix-scan algorithms from Hacker's Delight 16-2, http://threadlocalmutex.com/?p=126 and https://github.com/rawrunprotected/hilbert_curves are very cool (logarithmic!). But existing implementations are only for 16-bit coordinates (z < 16), and are already much more complex than loop-based finite-state transducers. They could be adapted to use 64-bit arithmetic for the full range of 32-bit coordinates, but only with further 2x code length increase.

prusswan commented 6 months ago

Thanks for those alternate references!

Just to be clear, the implementations I provided are what most people would call LUT-based, only the LUT is so small that it is encoded into 2-3 small integer immediate constants (avoiding any possible memory lookup dependencies). Exactly the same table as https://bertvandenbroucke.netlify.app/2019/01/18/space-filling-curves/ is used, just encoded more efficiently.

I am guessing as much, but my bit-fu isn't good enough to tell if it is functionally equivalent to the versions I can understand. Fortunately there is always the option of plotting the values.

bdon commented 6 months ago

Can you create your own repository or NPM/Python packages for these functions? That would make it more applicable than just in PMTiles implementations.

himikof commented 6 months ago

Can you create your own repository or NPM/Python packages for these functions? That would make it more applicable than just in PMTiles implementations.

Yes, that could also be useful, but I'd like to explain why I think that specifically having an efficient algorithm documented and explained along with the specification has much benefit:

  1. PMTiles specification fundamentally transfers a bit of complexity to its clients (or compatibility proxies). This part will need to be implemented many times in different languages and environments, so the expectation that readily-made libraries will cover all the needs is unlikely to hold.
    • As a specific example, current implementation of Cloudflare Workers for Python does not allow custom external dependencies.
    • C++ projects often have a high bar (and cost) for bringing in new dependencies, and it would be hard to justify it only for a couple of short functions. And the library interface needs to be made needlessly complex to support diverse styles of error reporting (without compromising performance or bringing in even more dependencies). While a project-specific implementation can just use project-specific facilities and integrate seamlessly.
    • There are also less popular languages, with smaller library ecosystems. Removing the need to search for a fitting solution or diving deep into Hilbert curve math to make their own just to implement the otherwise straightforward specification by providing (and explaining!) some example code will remove an obstacle for them.
  2. These functions are actually not as widely applicable as-is, because they directly compute a PMTiles-specific ID in a specifically optimized way, and are not a generic Hilbert curve implementation. One could say that a library with a fast 2D 32-bit Hilbert curve (such as Rust fast_hilbert) would be enough to make a 5-line implementation following the spec by just adding/subtracting a prefix length, but, unfortunately, I've seen that most implementations seem to have trouble with efficiently handling prefix calculations. I've seen linear and even quadratic complexity implementations in zxy_to_tile_id, often with expensive operations in the loop. This could also be improved by proving an explicit $\left\lfloor \frac{2^{2z}-1}3 \right\rfloor$ formula for the prefix in the section 4.1 of the specification, too. Also, languages using fixed-width 64-bit integers (so, most of them) need some careful handling of potential overflow during the z calculation in tile_id_to_zxy.
  3. I've also seen implementations to pull in a rather slow generic N-dimensional Hilbert (and other Hilbert-like) curve library just to calculate PMTiles tile IDs. Without some specific guidance, this unfortunately could be the easiest way to implement the spec in many cases.
  4. I agree that while these algorithms are short and efficient, they can also be hard to understand without a verbose explanation. If such explanation will be provided in a specification appendix, any implementation reusing them can just link to it without writing another one or leaving it as an unreadable "magic" piece of code without any.

The only language that I know of that is meaningfully different enough to need non-trivial adaptations of these simple algorithms and, fortunately enough, has a great library ecosystem, is JavaScript. So JS would not profit from just documenting them much. Unfortunately, I am not a JS developer myself, and would not be comfortable with maintaining an NPM package. But if anyone else wants to do so, feel free to! The code I posted is public domain, so it could be included or adapted as needed. I will think about providing a PyPI package for Python, though.

I'm also willing to make PRs to improve some existing implementations, but I wanted to tackle the larger task first. Do you think that a document (appendix?) for the specification itself with some explanations, example code and implementers guidance would be welcome? Some small additions should likely go into tile ID description in the spec, too. I just want to make sure before spending time writing it.

bdon commented 6 months ago

My goal for the specification is to define only the minimum necessary to read and write the format. The reason is that it is optimized for being able to maintain this as an individual over typical timescales for these open source projects (5+ years). The pseudocode parts of the spec document as-is, in retrospect, may even be too much as-is.

As of right now I do not believe that the cost of encoding/decoding the IDs is a bottleneck relative to other parts such as network latency. So I don't put a high priority on these optimizations, especially if they significantly expand the code I need to maintain in the specification, unless you can show concrete benchmarks of significant improvements on typical runtimes.

Maybe you can maintain a repository called pmtiles_tileid_implementations with many annotated code snippets that people can copy from? I would be happy to link to that from the spec docs!

bdon commented 1 month ago

C++ projects often have a high bar (and cost) for bringing in new dependencies, and it would be hard to justify it only for a couple of short functions.

I think it would be most useful to have a separate repo of implementations that can be copy-pasted or inlined as a header, That can be maintained and edited separately from the actual PMTiles spec document, but we can add a link to it in the spec document to give implementers useful pointers.

The JS implementation is the most heavily used and doesn't seem like it will benefit from any new approach listed above. I think the C++ and Python implementations can see some benefit and if there was a side-by-side comparison showing a significant improvement for 64 bit IDs that would be a welcome PR in here.

I'm going to close this for now as it's not immediately actionable to the code in this repo.