Open vberlier opened 2 years ago
I recently diagnosed a performance problem where serializing a structure file with millions of entries was taking 60 seconds. I'd like to make nbtlib
viable when you also have really large workloads so I want to see what's possible by either trying to rely more on numpy, or using cython, or an optional C extension a bit like what they're doing for https://github.com/simplejson/simplejson.
At the same time I also want to keep nbtlib
usable in situations where you can't use pre-built wheels, so this needs further investigation.
There's another type of feedback I got a few times over the years, and it's that the wrappers are annoying, especially for beginners. They expect to be able to use nbtlib
like the json
module, where you can simply pass plain dicts back and forth and everything works out of the box. This is already somewhat supported if you use schemas but I'd like to see if there's anything that can be done to make things even easier, like automatically generating a matching schema when reading files for instance.
Also for schemas maybe there's a way to leverage or get inspiration from the way pydantic
does things. https://github.com/samuelcolvin/pydantic/
About SNBT support and the nbt path parser in nbtlib
, I think they could be improved a little. I'm working on mecha
, a full-blown parser and compiler pipeline for Minecraft functions (https://github.com/mcbeet/mecha) and I think I did a better job for the snbt parser over there. The parser in nbtlib
can't be embedded in other syntactic constructs and that's why I had to make an alternate version in mecha
. I'll probably try to bring it to nbtlib
at some point.
Just a quick feedback before I have time for more in-depth discussions (right now I and the one busy, lol). And, btw, loved this idea of a discussion issue before 2.0!
I recently diagnosed a performance problem where serializing a structure file with millions of entries was taking 60 seconds. I'd like to make
nbtlib
viable when you also have really large workloads so I want to see what's possible by either trying to rely more on numpy, or using cython, or an optional C extension a bit like what they're doing for https://github.com/simplejson/simplejson.
Yeah, serialization is really slow. Not that I care about SNBT. But before you think otherwise, know that nbtlib
's parsing is fast. Trust me.
My mcworldlib deals with Worlds, so it too deals with millions of entries, considering each mca
Region file can have up to 1024 chunks, each one the equivalent of a whole nbtlib.File
. And my medium sized World (~500MB) contains ~1300 regions. Just loading them all in a World.get_all_chunks()
loop takes more than 2 minutes. So yeah, .parse()
performance is really crucial for me.
So I did some benchmarks using several NBT libraries (Amulet-NBT, twoolies's NBT, Tk's pynbt) and to my surprise nbtlib came first among pure-python implementations (closely tied with pynbt). Twoolie is noticeably slower (~ -30%), and Amulet came last (~ -40%). But of course, none can compete with Amulet's Cython implementation, which is 4-5x faster. I'll son post the benchmark script in my library, if you're interested.
Main point is: nbtlib's parse
is already fast enough considering the pure python alternatives. Take that into account before trying to optimize it. And don't break it for the sake of SNBT.
At the same time I also want to keep
nbtlib
usable in situations where you can't use pre-built wheels, so this needs further investigation.
If one can't use wheels (why?), it will build from source, what's the problem?
I've also benchmarked serialization alternatives, and man,msgpack
is awesome. It was able to serialize a Compound
without specifying any handler (something json
requires, and I assumed any serializer would too). and it's blazing fast.
MCC Chunk 0.2072 mpack -> <class 'bytes'> (21,875,554) 1.9993 write -> <class 'bytes'> (24,460,279) 3.9964 snbt -> <class 'str'> (31,037,955)
There's another type of feedback I got a few times over the years, and it's that the wrappers are annoying, especially for beginners. They expect to be able to use
nbtlib
like thejson
module, where you can simply pass plain dicts back and forth and everything works out of the box. This is already somewhat supported if you use schemas but I'd like to see if there's anything that can be done to make things even easier, like automatically generating a matching schema when reading files for instance.
I also think a lot about this, but any solutions I came up require creating a Compound.__setitem__()
(or even Base.__setitem__
. And I'm always reluctant to go this route as it can have a major hit on usage performance.
The simplest implementation strategy I'm thinking of is something similar to this pseudocode:
def __setitem__(self, key, value) -> None:
if not isinstance(value, Base) and key in self: # and isinstance(self[key], Base) too?
# Overwriting existing key with a non-tag value. Cast to old value type (if old is a tag)
return super()[key] = self[key].__class__(value)
super()[key] = value
But that's an 1-2 isinstance()
and a key in keys
test for every assignment. We need some proper, serious benchmarking before going this route.
Or we can let the user select this "auto-cast" mode per instance:
@property
def auto_cast(self, value: bool):
return self.autocast_setitem == super().__setitem__
@autocast.setter
def auto_cast(self, value: bool):
self.__setitem__ = self.autocast_setitem if value else super().__setitem__
This could default to False
(so load/parse
gets non-casting, fast behavior), and File.load()
could set it to True
after fully creating/parsing the instance and before returning self
to the caller (the same way it sets filename
)
I kinda lose confidence in code the longer it sits around unattended and since the core of the library is now a few years old I'm really itching to do a refresh haha. But I actually didn't know how well nbtlib
faired against other implementations, really glad you benchmarked this!
When it comes to performance I'd primarily try to improve parsing and writing raw nbt data, as you mentioned I agree that's the most critical use-case. And I won't unnecessarily break things, I really like how the API feels so it's really more about potentially refactoring some lower-level elements or finding fast paths.
If one can't use wheels (why?), it will build from source, what's the problem?
A lot of windows users have python installed but no Visual C++ so if there aren't pre-built binaries available for them they won't be able to install the package. Also I've seen people interested in running their utilities on mobile or in the browser with Brython and I'm pretty sure C extensions are a massive headache for this.
Also I forgot about mypyc
, which I think would actually be the perfect tool to speed up nbtlib
since I'm planning to add type annotations anyway. Since it generates a C extension from pure python code, the original source can seamlessly be used as a fallback. I haven't experimented with it yet but I'm really curious. https://mypyc.readthedocs.io/en/latest/index.html
msgpack is awesome
Haha yeah sometimes I kind of wish Mojang wasn't stuck with nbt too!
The auto_cast
thing is kind of clever, but the actual implementation of the setter would probably need to recurse into its children to also make the nested compound tags behave the same. But it's something to think about. Maybe this could happen automatically at the end of __init__
. I'll make an issue for this.
If one can't use wheels (why?), it will build from source, what's the problem?
so if there aren't pre-built binaries available for them they won't be able to install the package. [...] Also I forgot about
mypyc
, which I think would actually be the perfect tool to speed upnbtlib
since I'm planning to add type annotations anyway. Since it generates a C extension from pure python code, the original source can seamlessly be used as a fallback. I haven't experimented with it yet but I'm really curious. https://mypyc.readthedocs.io/en/latest/index.html
Will take a look, never heard of mypc, sounds awesome. But meanwhile... there's always the Amulet approach:
# in nbt.__init__.py
try:
from nbt_cy import ( # a cyton nbt.pyx module
a,
b,
...,
)
except ImportError:
from nbt_py import ( # a pure-python nbt_py fallback module
a,
b,
...
)
I kinda lose confidence in code the longer it sits around unattended and since the core of the library is now a few years old I'm really itching to do a refresh haha.
nbtlib
's design is amazing. It's elegant, beautiful, clean and easy to use. I've always disliked the pymclevel
approach of tag.value
/tag.name
, it's really cumbersome to work with. Not to mention TAG_Xxxx
names, eww. And many current implementations still inherits that design. Some are only now subclassing dict
and allowing things like mycomp["key"] = xxx
by translating that into xxx.name = key; mycomp.value.add(xxx)
in a Compounds __setitem__
!!!. Eww!
But I actually didn't know how well
nbtlib
faired against other implementations, really glad you benchmarked this! [...] When it comes to performance I'd primarily try to improve parsing and writing raw nbt data, as you mentioned I agree that's the most critical use-case. And I won't unnecessarily break things, I really like how the API feels so [...]
And I'm really glad you are focusing on parsing/raw NBT. But yeah, nbtlib
already fairs very very well, and it surprised me too. Usually high-level abstract and elegant APIs have a price. By "breaking" it I mean breaking the performance, not the API. For example, I've always feared adding __setitem__
because of the potentially massive performance hit.
We need good benchmarks on large data. Grab that 3.8MB MCC chunk in my data/
folder, and the 7.8MB r.2.1.mca
file (loop it using my Anvil parser or stitch each raw chunk into a massive supercompound). We need some very good tests so we don't rely on possibly wrong assumptions.
That's really sweet, thanks! I'm gonna make good use of these test files, I'm about to start messing around with mypyc.
So another thing I'm thinking about for improving .parse()
performance is making unpacking (and byteswaps) lazy and using memoryviews to track the position of each tag in the original buffer. This would avoid copies until tags are mutated and potentially put a lot less pressure on the garbage collector. I think it would work well for dependent libraries like mcworldlib
because a lot of use-cases just require modifying a few blocks or entities so if we can avoid doing unnecessary work entirely this might lead to even greater performance gains than mypyc or cython. Of course it would also be possible to investigate both.
msgpack is awesome
Haha yeah sometimes I kind of wish Mojang wasn't stuck with nbt too!
Minecraft is, but I'm not ;-)
I'm planning on using msgpack
as a serializer just to feed it to a hash function (most likely xxh3_128
*) so I can use it to determine if a chunk/region was changed or not prior to writing. That way I can avoid adding a __setitem__
and still allow an "auto-dirty" feature.
Of course, this trouble is only worth it if msgpack is considerably faster than nbt's write(). And it is. By a tenfold.
*: man that thing is a beast, it's even faster (and larger) than CPython's native hash()
!!! As in 6-8 times faster. And it's 128 bits for the collision OCD like me.
So another thing I'm thinking about for improving
.parse()
performance is making unpacking (and byteswaps) lazy and using memoryviews to track the position of each tag in the original buffer. This would avoid copies until tags are mutated and potentially put a lot less pressure on the garbage collector.
I would surely benefit from it, bounds like a lot of trouble for possibly very little gain. I mean, I was really impressed by Amulet's Cython implementation. It's so fast that it really discouraged me to try any non-trivial optimization on pure-python code.
Too bad Amulet's API is so bad, with its outdated pymclevel heritage (that everyone seems to mimic). I even considered learning Cython just for the sake of making a pyx
implementing nbtlib
's API. At least a subset of it. Possibly no Path
, certainly no Schema
, but the single fact of Compounds being dicts, and not having to deal with .value/.name
everywhere would be enough for me.
Hello, I'm not sure if you still want to work on this or if you took a break, do you have any updates?
I have a local branch that I revisit from time to time to experiment with some of the stuff mentioned in the thread, but I haven't had the occasion to focus on nbtlib in a while. I'm always working on some other project and when I need to use nbtlib it still does its job pretty well, so in practice the things I want to improve don't really bother me enough to make me drop whatever I'm working on to do the nbtlib revamp.
However I really want to get to it done at some point. It's kind of weighing on me mentally but I really care about doing it right. I think I'll try to set some time aside for nbtlib in september. It's getting kinda ridiculous, I hope I'll be able to close this issue before its second birthday...
This tracks a number of issues that should be resolved before making 2.x releases stable. I think the recent feedback shows that
nbtlib
could benefit from a little revamp. Proper static typing is long overdue and I'd like to take some time to experiment with optimizations to speed everything up a bit.Also this could be the perfect occasion to revisit old issues. It would be interesting to revisit the idea of
nbtlib.contrib
once the main issues have been addressed (#60). Also I'm not sure about the status of thesetup.py
situation anymore (#54). And finally, I've always struggled with writing documentation. Maybe we can come up with a better strategy, or find a way to break it down and make a proper roadmap for this (#16).If anything else comes up I'll also add it in here.