owengage / fastnbt

Fast serde serializer and deserializer for Minecraft's NBT and Anvil formats
MIT License
185 stars 35 forks source link

[Feature Request] `Region` wrapper #79

Open ToBinio opened 1 year ago

ToBinio commented 1 year ago

It would be very useful to have some sort of abstraction above a whole dimension (collection of all regions). Which then enables the usage of things like #17 and the possibility to set blocks.

This could go as far as managing all 3 Minecraft-Dimension (Overworld, Nether, End) at the same time. Something like a World.

What are your thoughts on such abstractions?

owengage commented 1 year ago

I like the idea. The main thing holding me back is what the interface should look like. A get(x,y,z) would be horrendously inefficient if iterated badly. For example going from the furthest negative x to positive x, then moving a single y. IMO an interface should be hard to use badly, so this doesn't suit me.

We could provide higher level interfaces than getting a block, for example iterate over every chunk in a dimension, or iterate every block. The implementations would then iterate efficiently, eg region at a time.

Do you have any use cases in mind, and we can see what a higher level interface than get might be?

Edit:

A good start would probably be a Dimension abstraction that allows you to iterate over regions, and have regions allow you to iterate over chunks (which the RegionFileLoader and Region sort of already implement), and have chunks allow iterating over blocks.

ToBinio commented 1 year ago

I dont have any very specific use case in mind. I simple once planed to do some world analysis.

Do you have any plans for adding a block placing | chaning API?

Any other specific thoughts for a iteration API? other wise I would try beginning with work on it.

owengage commented 1 year ago

I have no plans to implement it myself right now, very happy for someone else to tackle it. :)

A Region currently has an iter() method that provides the raw chunk data. I think trying to come up with how a mutating version would look would be a good first step? I'd like to not enforce use of fastnbt if I can help it, people should be able to use whatever deserialization/serialization mechanism they want.

A mutating iterator would have to read from the file then write to the file, which might be a bit tricky as these things can fail. Ideally there would be a way to signal removing a chunk entirely as well. iter_mut()'s interface might make this all a bit difficult.

For mutating a region I think something like this is reasonable, which is possible now:

fn do_something(ChunkData) -> Option<ChunkData>;

let region = todo!();
let chunks = region.iter().map(|chunk| { do_something(chunk) }).filter(Option::is_some).collect();
// write a brand new region file with `chunks`.

If all of that is figured out, extending similar logic to an entire dimension and world sounds good.

In the more distant future (not now), a dimension level 'window2d' API would be great for the rendering. This would iterate a chunk and allow access to the chunks around it. It would automatically take care of loading the up-to 4 region files and chunks to do so.

ToBinio commented 1 year ago

what do you think about some sort of backing step, wich unifies the data structure of all chunks. Something similar to what the JavaChunk enum does, but also on a Data level.

this would have a performance penalty at the creation of the chunk. But would simplify working with it and possibly even increase the performance if a lot of reads happen.

thinking forward this could also simplify adding a new "format" and if writing chunks to files is added would make it easy updating an old chunk to a new fileformat.

owengage commented 1 year ago

Sorry I'm not sure I understand what you're suggesting. What do you mean by a backing step? Can you give an example?

ToBinio commented 1 year ago

very simplified it could look like this.

struct MinecraftChunk {
    blocks: Vec<Block>,
    status: String,
}

let minecraft_chunk = current_minecraft_chunk.into_minecraft_chunk();
let minecraft_chunk = pre18_minecraft_chunk.into_minecraft_chunk();
let minecraft_chunk = pre13_minecraft_chunk.into_minecraft_chunk();

let block = minecraft_chunk.block_at(0,0,0);

with baking in this case i mean the action of e.g. creation a vec of Blocks out of the chunk version specific data.

the MinecraftChunk struct could now have more complex methods wich only need to be implemented for his data strucutre.

owengage commented 1 year ago

Ah I see, like a chunk data format entirely internal to fastanvil, so we only have to implement methods once. Having to match on the enum all the time is annoying, so it's a good idea.

I think keeping sections and block palettes makes sense, though. The internal format would probably look very similar to 1.19's now.

One major issue here, is that if people are expecting to be able to mutate these chunks we're going to need to be able to go from this internal format to the modern format (and ideally older formats if possible). The existing chunk types are not designed to be written back to disk. They only capture information relevant to rendering a world. For example, they do not capture entities at all, so serializing the JavaChunk would lose information and probably not even load.

An internal data format like this would need to capture all information if it wants to be able to write this data back. At the moment fastanvil leaves that it up to the user of the library to provide a suitable chunk type. Fastanvil providing a type that supports serializing back into a world would be nice.

If fastanvil were to support this I think it would make sense as part of its own submodule, maybe a fastanvil::complete submodule containing these types. This would let it change completely separately to the render related types. Eventually the current types could be moved under a fastanvil::render submodule to keep everything separate.

I'd say it's a lot of work!

ToBinio commented 1 year ago

would you say it does make sence setting the foundation for such a change though some sort of QueryChunk wich is a simple version of this CompleteChunk | InternalChunk with only read acces to blocks (and maby bioms). Which, to return to the beginning of this isue, impletions an iter though all blocks and maby other ways of access.

this would not replace the current Chunk system instead just adding yet another layer ontop. (which looking at the already very complex / deeph structure maby sound scary....)

futher there could be a QueryRegion or QueryDimension with also only has read methodes simular to was is discripted at the beginning of this isue.

owengage commented 1 year ago

I think creating a complete::Chunk struct with a subset of the world data for now, maybe just stick to the latest chunk version as well. The Query* types sound like they would be replaced by complete::Chunk so lets just go straight to it.

It's tricky to implement Deserialize for a general chunk as you're forced to use an enum of chunks for the different versions. If you want to Deserialize this then the chunk needs to be deserialized into a buffering type and attempted for each enum variant (this is how serde does it under the hood). This is why the current chunk doesn't implement Deserialize directly. This complete type can also avoid implementing it I think, and instead have a from_bytes method as well.

That could then provide an iter_blocks(&self) method that returns something implementing Iterator<Item=&Block>? This keeps it all read-only for now, and we can expand to mutability and older chunk versions over time?

ToBinio commented 1 year ago

sounds good!

you said that you would keep the sections and block palettes. In which way would it then be diffrent to the CurrentJavaChunk?

we could simplify the SectionTower (so remove the mapping part) to a simple vec make a section store its block / biom palette and a vec of indices (so no bytes / bit shifting)

any other stucture changes?

at the end of the day the deserialization would work the same as with the JavaChunk but instead of storing the CurrentJavaChunk it converts it to a complete::Chunk. correct?

owengage commented 1 year ago

You're right, it probably wouldn't be much different to the CurrentJavaChunk. The current storage Minecraft uses is pretty decent (it took enough changes to get there).

I don't recall why I made the section tower have a Vec<Section> and a mapping from y to section. Entirely possible to simplify that. The y section values can be negative, so need offsetting.

As you say the block data can be vec of indicies. Since a section is 16^3, a Vec<u16> should fit the max possible palette size.

Yeah, so I'm picturing Chunk::from_bytes(&[u8]) -> Chunk all in the complete module. This would handle all of the conversion from whatever version of chunk into the internal format. complete::Chunk can be updated to eventaully include all fields. Using the flatten trick might help, but doesn't help with the chunk versioning issue.

ToBinio commented 1 year ago

after implemanting the most basic complete::Chunk i would like to start working on somesort of complete::Region which abstracts the Region-file.

i would image the creation api to be simular to the api of the "normal" Region. but instead of giving access to the chunk_data it would give dirrect access to a complete::Chunk. additional it would add a iter_all_blocks() and so on.

one question would be if we should pre create all chunks when the complete::Region first gets created. Or if it would be better to have same sort of lazy chunk loading. so it only loads chunks once there are needed.

the first way would be very simple to multithread. but allways has a big performace impackt on first creation. the second way would open another question if the complete::Region should keep a constant referenc to the "normal" Region or if it should just copy all of the chunkdata at the start which once again would probaly have a performance impackt.

what do you think?

owengage commented 1 year ago

Regions can theoretically get pretty large in filesize, with a compressed chunk being up to 1 MiB, and 1024 of them per region, so storing it all in memory could be quite a pain (and not necessarily any faster overall).

I think it would be nice for Region itself to have an iterator that gives you chunks directly, rather than another type. Maybe the existing Region::iter should become Region::iter_raw and add a new method for the complete chunks, and another for blocks. The methods would end up looking like this:

Region::iter_raw(&mut self) -> impl Iterator<Item=Result<ChunkData>
Region::iter_chunks(&mut self) -> impl Iterator<Item=Result<complete::Chunk>>
Region::iter_blocks(&mut self) -> impl Iterator<Item=Result<&Block>>

It mixes complete with the root of the module a bit. I think ultimately the complete module might move to the root and the other stuff will end up in a submodule instead. This would make the 'default' API be the complete one, with the more optimized ones in a separate module.

One potential issue here is iter_chunks here would not have access to the x and z of the chunk in the region.

ToBinio commented 1 year ago

In which way would you say an API like the one requested in #17 should be implemented?

would you say a Dimension wide manager | wrapper should implement such API. This Dimension-manager should then cach already created chunks or would you say chunk chaching should be add to the Region it self?

owengage commented 1 year ago

I think I would say to simply not implement #17.

A Dimension or World struct would allow getting regions, which allows getting chunks, which allows getting blocks. I think chunk caching should be left to the user of the library. The caching you would want would probably depend on your use case.

The iter_blocks on a Region offers a simple API for people interested in just doing stuff like counting blocks.