Open MestreLion opened 2 years ago
I just noticed that pre-made Struct
instances are already saved on BYES/INT/...
, so read_numeric()
is not creating new instances per-call. Great!
But, still, are improvements to this crucial function welcome?
At runtime, read_numeric
should only perform a dictionary lookup to grab the appropriate struct format and then read and unpack the data. Of course it's in the hot path when parsing so performance improvements would be very welcome but I'm not sure if there's any opportunity for easy wins here. But feel free to experiment with it if you have something in mind!
That's why I created the benchmarks with other NBT implementations... no point doing experiments if I can't accurately measure the gains. And little point trying to improve what is already pretty damn good. My initial assumption that it was slow and could be "vastly improved" turned out to be wrong.
But still, one experiment I might try is to use an (attribute?) assignment once per File|Root.parse()
that sets the endianness, instead of a "run-time" dictionary lookup for every tag. So when Compound.parse() says read_numeric(BYTE, ...)
, that BYTE would not be a big/little dictionary anymore, but already one of those values/Structs. The job of fmt[endian]
would have already being performed by File. read_numeric
would take not a dict, but a Struct (or whatever) of a given endianness that was set prior to that. And Compound
, as now, would be completely unaware of all of this.
The point is that there is little point allowing endianess to be set on a per-tag basis. Either the whole file is little endian or big endian, so we can take advantage of this assumption.
Humm, perhaps Compound
would have to be a little aware, as it may have to use self.BYTE
instead of a module-wise BYTE
. Humm, class attribute lookup. Bad tradeoff?
Benchmarks. We need benchmarks.
Or skip all of that and go Cython. Please!
An interesting optimization approach taken by Minecraft: it caches all 256 possible Byte
values as pre-built instances.
When doing some profiling loading NBT files, trying to optimize loading times,
read_numeric()
stands at the top by a large margin. Taking a closer look at it, it seems this is the culprit:And that is universally used in all tag classes using a similar pattern:
The problem is:
read_numeric
creates a newStruct
instance on every read. That is a very expensive operation. There should probably be a way to pre-build (or cache) such instances, so eitherread_numeric
orget_format
or evenBYTE/INT...
contain/return the same struct instances, while still keeping the ability to selectbyteorder
on a per-call basis.I can submit a PR to fix this, and I'm sure reading (and writing) times will vastly improve. I'll do so in a way it does not change the API of any of the tag classes (i.e, keep
Compound.parse(cls, fileobj, byteorder="big")
signature for all write/parse of all tags), and possibly keepread_numeric()
signature too (so no changes to the Tag classes at all), but most likelyget_format()
will change signature and/or internal structure, and the underlyingBYTES/INT/...
will most likely change their internal values, but I'll do my best to keep them still byteorder-agnostic constants .Is such improvement welcome?