Anvil format. - Githubissues

Fenixin commented 12 years ago

Is anyone working already on this? If not I'll start it soon.

twoolie commented 12 years ago

I'm not currently working on it because I have no time at the moment. If you wish to have a go, please be my guest but i will probably get around to it in a couple of weeks...

Also, NBT is the library core. Please do not modify it to support anvil specific features. Please use the example of region.py to abstract the Anvil wrapper format to a usable interface that goes hand in hand with the core library.

Fenixin commented 12 years ago

Sure. Should the old region.py kept and add a new anvil.py or modify region.py?

twoolie commented 12 years ago

I would prefer you keep region.py around unmodified. I'm not sure how different anvil is, but perhaps you can subclass Region?

dtrauma commented 12 years ago

As you see from my pull request nbt.py has to be modified to support the new IntArray-tag. Regarding this, the version I sent uses the struct module. Sadly that means the tag's value is either a tuple (read only) or we have to convert it to list or array (slow). Using python's array module means manual big/little endian conversion on read and every write (either two conversions or a copy and a conversion). The array module of NumPy could do all we need fast and space efficient, but would be another dependency. :/

twoolie commented 12 years ago

Instead of pulling the pack out every time, do what I have done in other places and do

size = calcsize(self.fmt)
self.value = list( unpack(self.fmt, size) )

This will add a little bit more overhead, but means that you will always get a list back. and the reverse can be done as well to repack (pack just takes an iterable, yes?)

The array module of NumPy could do all we need fast and space efficient, but would be another dependency. :/

I Also feel that this would be faster but I also feel that people who really worry about performance will likely subclass my naive implementation and substitute a NumPy version if they want it. It's very easy to do with the current library.

dtrauma commented 12 years ago

Done.

people who really worry about performance will likely subclass my naive implementation and substitute a NumPy version if they want it.

Actually I'm considering it. It could be automatic if NumPy is found.

twoolie commented 12 years ago

It could be yes, but i'd rather some kind of switch. As far as I understand it, Numpy creates certain restrictions on the list semantics, yes? I'd rather not impose that on potential users.

twoolie commented 12 years ago

Made any progress on the anvil format?

Fenixin commented 12 years ago

Not even tried yet :/. But I really want to give it support as I did for region files (with good error handling).

dtrauma commented 12 years ago

Ugh, this got longer than anticipated, sorry for the rambling :).

I didn't look all too closely into numpy yet tbh, but the possibility to reshape an array without touching the data is really sexy. Then you could do xyz access directly with the data nbt returns without writing a custom accessor or copying the data to some other format. Unlike pythons array it is fairly flexible, too, but i don't know yet how it compares to a list. Perhaps I'll find some time to look into it. Right now my personal itch is scratched, though (see http://digital-trauma.de/mc.html ) and the most CPU time is spent actually iterating over the data, not in nbt, so it isn't high on my list.

I also looked into the other changes in Anvil. I don't think it should be that much work to adapt region.py. One thing that kept me from just trying to do it is that Mojang is a bit capricious with changing their formats (I was a bit surprised to find a new tag in there without any documentation, it seems they just pulled the old nbt.txt) and that I'm unsure on how to test it (particularly if you want to do good error handling).

Thomas, do you plan to do a release with this soon? Right now I have to run HEAD everywhere.

macfreek commented 12 years ago

I like to take a look at Anvil as well.

Does any of you have an active fork? Before I can code, I need to change some line endings (it now mixed Unix and MS-Windows line endings). However, I rather do that when there are no active forks.

macfreek commented 12 years ago

The changes required for Anvil are not in region.py, but in chunk.py, as the changes are only in the individual chunks, but not in how the chunks are stored in a region file (even though the file gets a new extension).

The most important change is to support the upcoming AddBlocks, and the split of chunks into sections, and the swap of X and Y coordinates.

To support AddBlocks, a bitwise operation is required for every block: e.g. blockId = (AddBlock << 8) + Block. This will give a performance penalty, and more or less negates the performance jump. On the plus side, adding the data bits has the same penalty, and should be breeze.
To support sections is not so bad, but is mostly a big change for the examples, since most use "chunks" as their foundation. That needs to be changes to sections (the change should not be masked by the API, because sections can be empty, and hiding that from scripts gives a unnecessary performance penalty)
I'm not that concerned about the swap of X and Y coordinates. A trick that may be useful is to use the step parameter in list selection. Eg. blocks[0:-1:16] returns an array with each 16th entry in a list. I don't know of this performance compared to NumPy. Love to see a test!

A question to twoolie: chunk.py currently mixes two functions, parsing the chunk format and creating an image map. Would you be OK to move the image map functionality to an example script instead?

twoolie commented 12 years ago

@dtrauma

I Will do a merge tonight of the current code and the outstanding pull request that changes line endings. I think it's also about time that I fixed up and completed the test suite, especially if we're about to do some refactoring. (If that all works out, i'll do a release before starting work on Anvil.)
I too was surprised to see an extra tag there, especially when they already have TAG_List that essentially does the same thing. sigh

@macfreek

There are some active forks right now (Have a look in the Network Tab. I haven't been paying much attention lately).
Can you make the necessary changes to region by subclassing Region or would it be better to read a flag? Is the only indication of file type in the extension?
Please by all means break out the functionality in chunks.py but wait until I make a release later tonight (Will include line ending changes).

macfreek commented 12 years ago

@twoolie The only active forks in the network are mine. But others may have uncomitted changes that have not yet been pushed back to GitHub.

Again, I don't think Region need to be changed at all. The changes should be in chunk.py, and in the examples. E.g.

 for infile in os.listdir(os.path.join(world_folder,'region')):
     print "process region file: " + infile

should be changed to:

for infile in glob.glob(os.path.join(world_folder,'region', '*.mcr') ):
     print "process region file: " + infile

I wait for your commit before doing significant changes. In my opinion, the most important thing to ask ourselves is how we think upstream tools will use NBT, and what kind of API do they expect? Perhaps I will work on some examples, as that gives a good idea what to expect.

macfreek commented 12 years ago

FYI, given that Minecraft 1.2 will be released within a day, I just report on some progress. I will shortly commit some changes to abstract the "world folder" concept, as most examples and external tools need helper functions to determine if something is a McRegion or Anvil world folder. I have not touched chunk.py, which is most work. So likely MC 1.2 will be released before NBT (sorry).

macfreek commented 12 years ago

I created two pull requests: for examples, and for nbt/world.py + examples/map.py.

I did some preliminary work to support to support Anvil. In this case some Python trickery to detect the type of Chunk before the chunk is initialised. That allows an upstream code to request a nbt.chunk.Chunk(), and 'magically' receive a nbt.chunk.McRegionChunk() or nbt.chunk.AnvilChunk().

I'm not sure if this kind of trickery is good or bad,so I have not created a pull request yet. If you're interested: https://github.com/macfreek/NBT/compare/master...anvil

Fenixin commented 12 years ago

Just tested all the new changes (thanks everyone!) with the anvil format, and everything works but chunk writing. There is a missing asterisk in the nbt.py file in line 135. The line should be:

        buffer.write(pack(self.fmt, *self.value))

Changed it and everything works perfectly. Should I do a pull request?

After fixing this anvil format is supported (at least is for me), so this can be closed.

macfreek commented 12 years ago

I can confirm that this is indeed a correct fix, please make a pull request, so twoolie can pull it.

Fenixin commented 12 years ago

Done!

Fenixin commented 12 years ago

Closing this.

twoolie / NBT

Anvil format. #16