owengage / fastnbt

Fast serde serializer and deserializer for Minecraft's NBT and Anvil formats
MIT License
186 stars 34 forks source link

Allow efficient access of Post18 chunk section blocks #47

Closed Aeledfyr closed 2 years ago

Aeledfyr commented 2 years ago

The API for pre18 chunks allows for separating accesses of the palette indices from accessing the data in the palette itself, which allows for significantly faster processing of chunks for certain workflows. The post18 chunk variant hides the palette and indices within the BlockData struct, and does not give any way of accessing the palette and index data independently.

The simplest API for my use case would be an iterator over the palette indices in a chunk (with a documented ordering, but allowing the user to track it separately) and an array or iterator over the actual block values in the palette. (I've built reasonably efficient iterators over this type of packed structure before, so I could implement one if necessary).

owengage commented 2 years ago

It would be very simple to expose the palette as a method in the impl of BiomeData and BlockData. The order would be whatever the order it was stored in the original NBT. Can you explain more about why the ordering is important in your use case?

Iterating over the data itself is definitely a little more involved. Would you expect this iterator to iterate over Blocks and Biomes for the respective types? It feels like an iterator with Item = (x, y, z, Block) or similar would be most complete, but a little odd.

Aeledfyr commented 2 years ago

The palette has to keep its original ordering for palette indexing to work, but I assume that wasn't the question.

The ordering of the data is important mainly to avoid the overhead of passing extra data for (x, y, z, PaletteIndex). When the iteration order is known, it is relatively trivial to calculate the position from the index:

let x = i & 0x000F;
let y = (i & 0x0F00) >> 8;
let z = (i & 0x00F0) >> 4;

The iterator for the data should return the originally stored data, which is an index into the palette.

This method allows for doing an expensive operation on each block type in the palette, and then trivially looking up the result using array indexing based on the original palette indices. If the iterator returned Block, you would likely have to do a HashMap lookup based on the encoded data of the block, which would be much slower.

Some pseudocode, in a semi-contrived example of counting the number of solid blocks in a chunk:

let mut palette_data = Vec::new();
for block in blockdata.palette.iter() {
    palette_data.push(is_solid(block)); // Some expensive operation
}
let mut solid_count = 0;
for index in block.states.iter() {
    if palette_data[index] {
        solid_count += 1;
    }
}

This doesn't rely on the order/location of blocks, but it does show the value in exposing the palette indices for caching purposes. Another example would be to decode a chunk into a (rather inefficient) bitmap of solid/nonsolid blocks:

// map[y][z][x]
let mut solid_map = [[[false; 16]; 16]; 16];
for (i, block_index) in block.states.iter().enumerate() {
    let x = i & 0x000F;
    let y = (i & 0x0F00) >> 8;
    let z = (i & 0x00F0) >> 4;
    solid_map[y][z][x] = palette_data[index];
}
owengage commented 2 years ago

Right, I understand. This sounds good to me. I think the following changes make sense:

The data can be empty if it's just a load of air, or in the case of biomes if it is all a single biome. It might be worth making BlockData always contain data, rather than it having an empty state, and allow the Section to have an optional BlockData instead.

I'll have a play around with the API and see what seems more natural.

owengage commented 2 years ago

@Aeledfyr master now:

I've not released this as a change for fastanvil just yet. I was wondering if you could try master out in your use case?

I believe the cargo.toml line should be

fastanvil = { git = "https://github.com/owengage/fastnbt", rev = "72cf8b265c38786fd4abf930eca85c540faf13e2" }
Aeledfyr commented 2 years ago

That API works, and appears to support everything I need for my use case.

Also, I hadn't realized before this point, but the pre-18 API has close to the same issue: While fastanvil::pre18::Pre18Section exposes the palette, it doesn't allow iteration over Pre18Blockstates. (Pre18Blockstates allows enough control over accessing that it should be feasible to work around it, but it would be less efficient than a fix similar to this one).

owengage commented 2 years ago

Similar API added to Pre18Blockstates. It would be nice to unify the interface of these two at some point. Released in fastanvil 0.24.

Let me know if you run into anything else!