Open MestreLion opened 2 years ago
The fix itself is quite simple:
class Root(Compound):
__slots__ = (
'root_name',
)
# Optional, but safer than relying on .parse() to be always invoked
def __init__(self, root_name: str = "", *args, **kwargs):
super().__init__(*args, **kwargs)
self.root_name: str = root_name
# Delete all other properties and its setters
# or leave a temporary "DeprecationWarning":
@property
def root(self):
print("Root.root is deprecated, just access its contents directly")
return self
@classmethod
def parse(cls, buff, byteorder='big'):
tag_id = read_numeric(BYTE, buff, byteorder)
if not tag_id == cls.tag_id:
# Possible issues with a non-Compound root:
# - root_name might not be in __slots__(), and thus not assignable
# - not returning a superclass might be surprising and have side-effects
raise TypeError("Non-Compound root tags is not supported:"
f"{cls.get_tag(tag_id)}")
name = read_string(buff, byteorder)
self = super().parse(buff, byteorder)
self.root_name = name
return self
def write(self, buff, byteorder='big'):
write_numeric(BYTE, self.tag_id, buff, byteorder)
write_string(getattr(self, 'root_name', "") or "", buff, byteorder)
super().write(buff, byteorder)
The biggest challenging in implementing this API breakage considerations. But if needed I'd happily provide a PR
The documentation, at least at the time, was a bit fuzzy. It said that nbt files are implicitly always compound tags. In python since we have dictionaries nbtlib doesn't see nbt compounds as a collection of named tags, where each tag has an intrisic name, but rather as a proper dictionary where tags don't have any intrisic name. This makes the interpretation of the root compound ambiguous. Since in this implementation tags don't have any intrisic names, we can make the assumption that the file is an implicit compound in the sense that we can assume it's prefixed with the TAG_Compound marker. This makes the root name a key of the implicit compound.
That's the reasoning that led to the way this is currently handled, but I know this just ends up adding unnecessary noise so I'd be happy to get rid of it. But yeah due to API breakage this would need to be published in a 2.0 release.
While I'm still digesting your great points, and elaborating an extended answer, I've noticed you dropped Root
class as a whole. Just to note this wasn't the proposed fix to the spurious Compound. File
, as it is, will still create one.
And Root
is still welcome as a class, because some tags, namely the chunks in an mca
file, are not a File
but still not a regular Compound
.
There are 2 important and distinct design choices involved here, both are open to interpretation by the "official" docs:
1 - Who "owns" a tag's name? Itself or its parent? 2 - Is the root tag of an NBT data always a Compound?
For 1, some libraries, such as the old pymclevel
and its Amulet-NBT
successor, adopted the "tags have their own names" direct interpretation. It makes the API really clunky and awkward to use, having to set tag.name
and the infamous tag.value
everywhere.
nbtlib
OTOH, thankfully adopted another interpretation: like in Python dicts, a tag's name is defined in (and by) its parent. A tag itself is unaware of its own name. Tags in a list don't even have names. And that's true in the NBT format too. It was a wise design decision that I'm really glad you did, it makes working with nbtlib a real joy, with its seamless and fluid integration with Python's builtins
But a tag not owning its own name leaves nbtlib
with a (small) problem: What about the root? In NBT, the root tag (whatever type it is), does have a name. Minecraft might always set it to an empty string, but the data is still there and it has to be parsed.
In nbtlib
's data model, that name belongs to the parent. But the root tag has no parent, so what to do with the parsed name? I see 2 options:
write()
. But where? That was the idea behind the Root
class: a wrapper on Compound
that simply stores the parsed name. A plain, read/write self.root_name: str
attribute.In my library (and in the example above), I did the latter. Root
exists solely for parsing the initial NBT bytes and storing the root name.
Can that name be saved at File
instead? Sure. But that would "force" all root tags to be a file. That's fine for .dat
files. But a chunk in an mca
file is not a file, not it a sense you could Chunk.load(otherchunk.filename)
. Chunks should not have self.filename
. So it makes sense to have an intermediate Root
between File
and Compound
, if even just for the sake of allowing a Compound
to store a root_name
while not having a filename
attribute or a .load()
method
So handling a root's name while keeping nbtlib
's data model is easy, and can be addressed in many ways. Now for the hard part, the other major design decision:
Compound
?In practice, yes. And virtually all libraries assume so, in one way or the other. nbtlib
assumed so when it created the "extra" Compound in File.parse()
: by invoking super().parse(buff, ...)
prior to parsing any bytes, it will create a Compound
that thinks the root tag is its (only) child (whatever the type root actually is). On write()
, that extra Compound will do what Compounds do: loop and write only their children, not themselves. And thus that (sole) child, the actual root tag, gets written, and everything works fine.
So, NBT-wise, nbtlib
is not assuming the root is a compound. If it were something else, it would be handled properly and losslessly. But to do so it wraps the real root in that extra Compound, exposing that awkward tag[''][...]
(or tag.root[...]
) to clients, something this issue is proposing to eliminate. We should be able to handle a root tag like any other tag, the return value of File.load(...)
should be a tag we can directly say tag[...]
. Yes, it would be nameless, like all tags are. If you want to retrieve the original name, for whatever reason, go elsewhere. tag.root_name
for example. But the root's name should not be an item of the the tag, being root should not create that extra layer.
In order to avoid the extra layer and preserve the root name, we must parse it separately from other tags, as each tag's .parse()
assume its own name (and tag type!) was already processed by its parent. So someone, File
itself or someone else, must read the initial NBT bytes and:
Amulet
approach. And mine too, in an extent. But I'm not happy with it.write()
assume it was a Compound. Ouch! But, if you did the above, that's no problem..parse()
. Seems easy, right?Problem is... a tag itself not only does not know its own name, it also does not write its own tag type on .write()
. So whatever tag was returned by File.load()
, it is unable to write itself as a root tag! root.write()
would be a broken NBT if root
was an ordinary tag. Writing root's tag type is somebody else's job. Its parent's job. And for root, who is that?
We need a way to save a tag, any tag, as root. That is, to write its own type and its own name, unlike other tags. So a root, conceptually, is not an entity by itself, but rather a trait. If we allow non-compounds as root, then "rootness" should be inherited by any tag. Root
(or File
) in this case is not a Compound
subclass, but rather a Mixin class.
In this scenario, class File
/Root
inheriting from Compound
becomes a problem. It means the root tag is always a Compound. And if we go that road, life is easy and removing the extra layer is simple:
File|Root.parse()
reads the root tag type, aborts if not a Compoundwrite()
the saved name and its own tag typesuper().parse()
, which is actually Compound.parse()
, which will loop and write its childrenLife is then good and simple. But we're imposing a restriction on NBT. Spec implicitely says its fine, but technically the NBT data does allow more.
We need a way to save a tag, any tag, as root. That is, to write its own type and its own name, unlike other tags. So a root, conceptually, is not an entity by itself, but rather a trait. If we allow non-compounds as root, then "rootness" should be inherited by any tag.
Root
(orFile
) in this case is not aCompound
subclass, but rather a Mixin class.
Just had this crazy (or brilliant) idea: why not make Root
behave like your List
? I.e., indexing it would create a new class on-the-fly. Just like List[Int]
means "A List of type (or "flavor") Int
", so would Root[Int]
would mean "A Root (tag) of type Int
". And there you go: any tag can now be a root tag!
Root[Compound]
s would be the same class.Root
, as a parent, would be responsible for:
.parse()
Root[type]
class and instantiate it. tag_type = X
(or similar name) would be a class attributeroot_name
instance attribute.write()
, use root_name
and tag_type
to write the initial bytes, then delegate to super().write()
load()
or save()
or filename()
. that's a task for File
Root
(base, without any index/subtype)Root
, unlike List
, would not allow direct instantiation without indexing a subclass__init__(*args, root_name: str = "", **kwargs)
that pass on all other arguments to super()
So, what do you think of this idea?
I've just found this:
http://web.archive.org/web/20110723210920/http://www.minecraft.net/docs/NBT.txt
An NBT file consists of a single GZIPped Named Tag of type TAG_Compound.
Decoding example: (Use http://www.minecraft.net/docs/test.nbt to test your implementation) First we start by reading a Named Tag. After unzipping the stream, the first byte is a 10. That means the tag is a TAG_Compound (as expected by the specification).
Which I guess settles the question. No need for Root
to be a mixin (although I still believe it would be nice to), it can directly inherit Compound
, and File
can assume so when reading and writing.
That said, can I make 2 requests?
Root
class? It's a very useful class to distinguish between some NBT data from an actual file.In my mind it didn't make much sense to keep Root
around as you can just as well use File.parse()
for extracting chunks in .mca
files. I guess the awkward thing would be the useless save()
and load()
methods.
The idea of making a Root
class that could be specialized for all tag types is really neat but it's probably over-engineered for something that will realistically only be used for compound tags. I have an idea that would probably be simpler to implement and just as flexible. I could make it so that some_tag.write(fileobj, root="hello")
prefixes the actual value with the tag name. Or an additional pair of methods available for all tags like parse_root()
and write_root()
that would handle the root name and then delegate to the regular parse()
and write()
methods.
Of course the issue is that you could still call parse()
or write()
directly by mistake, so it's a weaker alternative to the Root
class. In the end, I think the most straight-forward solution is probably the best here, I'll restore the Root
class.
In my mind it didn't make much sense to keep
Root
around as you can just as well useFile.parse()
for extracting chunks in.mca
files. I guess the awkward thing would be the uselesssave()
andload()
methods.
The useless (and potentially misleading) save
and load
in chunks are the exact reasons why I wanted a separate class for non-file root compounds.
The idea of making a
Root
class that could be specialized for all tag types is really neat but it's probably over-engineered for something that will realistically only be used for compound tags.
I agree. Specially now that we have "official" docs saying it's always Compounds. Not only realistically, but per the spec. But still... oh my... so tempting...
I have an idea that would probably be simpler to implement and just as flexible.
Sweet! I'm listing...
I could make it so that
some_tag.write(fileobj, root="hello")
prefixes the actual value with the tag name.
... tag name and tag type too, don't forget it. Hum, I like it. Gives me a shiver when I see a tag handles its own name (pymclevel
-style of things which I mentioned in another issue), but this would be a honorable, worthwhile exception.
Or an additional pair of methods available for all tags like
parse_root()
andwrite_root()
that would handle the root name and then delegate to the regularparse()
andwrite()
methods. [...]
Ewww, noooo, no, no. nbtlib
is too elegant for this.
In the end, I think the most straight-forward solution is probably the best here, I'll restore the
Root
class.
Thanks!!!! :-D
(And if I fail to resist the temptation, I in the future extended Root with a meta like List
does (I really like that pattern you did) to convert Root
from a hardcoded Compound
to a "dynamic mixin" class that could inherit from any class (and from itself, so Root[Compound]
is stll a plain Root too))
But I will resist this temptation. There are none non-compound roots in the wild. It's in the damn spec. It's useless. I'm better than that.
Full disclaimer here: in the meantime since I've opened this 12 days ago, I've played with some ideas about the chunk/region relationship that are completely unrelated to this File/Root discussion.
I'm trying to decouple Region
from the NBT implementation, so Region acts more like a storage system delivering raw NBT data for someone else to instantiate as a Chunk/Compound, rather than the first class that Dimension holds, meaning API-wise Chunk
would "talk" directly to World/Dimension: A Dimension would be a dictionary** of Chunks, not regions, using absolute (cx, cz) chunk coordinates, and the Anvil .mca
files would a Chunk implementation detail.
*: well, not Chunks, but rather "[category][Chunk]", as of 1.17 we have region/, entities/ and poi/ subdirs.
In this new model, it does make sense for a Chunk to have a .save()
and .load()
, a specialized one that invokes Region's (cached) parsers for loading and saving.
This is all to say that, in the end, it is possible that my Chunks end up inheriting from File
, so I won't use Root
after all, lol.
But I still think they have a reason to exist, both conceptually and in my heart :)
I've been silent as I've spent the last 10 weeks or so playing Minecraft instead of coding for it. Well, actually 10% playing and 90% creating mods and resourcepacks. Inevitably I stumbled on MCP and Minecraft's source code, and it gave many useful answers and insights:
Root/File
is always a Compound
. Strictly typed as such in arguments and return types of NbtIo.read/write()
and *Compressed()
counterparts, throwing IOException
on read if not (tag instanceof CompoundTag)
Root/File
tag is always unnamed . NbtIo.read()
simply discards whatever name is in the file data with a data.readUTF()
call without any assignments, and NbtIo.write()
calls data.writeUTF("")
as name.I am seeing this error:
TypeError: Non-Compound root tags is not supported: <class 'nbtlib.tag.End'>
on a couple of player.dat files that I know are not corrupt.
@Netherwhal , care to post some example files? I'm assuming they're empty players, correct?
In any case, I believe this would be better handled as a separate issue
@MestreLion / @vberlier - no those are proper nbt files that I can open without any issues with other tools.
I am happy to pay a commission for this to be fixed in this library.
@Netherwhal , what version are you using? I've tested latest 2.0.4 from Pypi and it handles your test case just fine:
~/minecraft/test $ venv
Cache entry deserialization failed, entry ignored
Collecting pip
Using cached https://files.pythonhosted.org/packages/08/e3/57d4c24a050aa0bcca46b2920bff40847db79535dc78141eb83581a52eb8/pip-23.1.2-py3-none-any.whl
Collecting setuptools
Using cached https://files.pythonhosted.org/packages/c7/42/be1c7bbdd83e1bfb160c94b9cafd8e25efc7400346cf7ccdbdb452c467fa/setuptools-68.0.0-py3-none-any.whl
Collecting wheel
Using cached https://files.pythonhosted.org/packages/61/86/cc8d1ff2ca31a312a25a708c891cf9facbad4eae493b3872638db6785eb5/wheel-0.40.0-py3-none-any.whl
Installing collected packages: pip, setuptools, wheel
Found existing installation: pip 9.0.1
Uninstalling pip-9.0.1:
Successfully uninstalled pip-9.0.1
Found existing installation: setuptools 39.0.1
Uninstalling setuptools-39.0.1:
Successfully uninstalled setuptools-39.0.1
Successfully installed pip-23.1.2 setuptools-68.0.0 wheel-0.40.0
(venv) ~/minecraft/test $ pip install nbtlib
Collecting nbtlib
Downloading nbtlib-2.0.4-py3-none-any.whl (28 kB)
Collecting numpy (from nbtlib)
Using cached numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Installing collected packages: numpy, nbtlib
Successfully installed nbtlib-2.0.4 numpy-1.24.4
(venv) ~/minecraft/test $ python
Python 3.8.0 (default, Feb 28 2023, 16:22:29)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nbtlib
>>> data = nbtlib.load('../mcworldlib/data/6d26bff4.dat')
>>> type(data)
<class 'nbtlib.nbt.File'>
>>> len(data)
55
Maybe this was fixed already by eda4d32180ff638b18f0dea0b3904afd5545fd65 ?
I've mentioned this a few years back, and now I'm familiar enough with NBT in binary form to raise this again, more confident that this is indeed a design bug in this awesome library.
Currently, when loading an NBT file like this (uncompressed for clarity):
We have this result:
It looks like we have a
Compound
as root, and then another (unnamed)Compound
inside it. But that's not true, the binary data clearly shows there is a single (unnamed)Compound
in the beginning. So the actual parsing result should be:Notice the root name is not represented in the case. And it does not matter, as a tag's name is parsed by (and belongs to) a tag's parent. A tag by itself has no idea about its own name (and tags in lists don't even have one).
Ok, the root tag of an NBT data does have a name, even if 99% (all?) of real world NBT have it empty. But having a name does not make it a Compound with a single child named as itself. This is wrong! If forces some weird syntax to acess the content:
Instead of a much simpler (and correct)
tag["DataVersion"]
If
Root
is aCompound
, it should not require extra syntax to access its contents. No other tag requires so. If preserving the root name is important for saving/loading integrity, it should be stored elsewhere (Root.name
perhaps?)If displaying this name whenever printing the root tag is needed (why would it be?), then I suggest this format:
This does not imply there are two nested compounds in the beginning. Much cleaner, easier to use, and correctly reflects the NBT data. You can even completely omit the name for empty names, and start right away with
{
(as my previous example)This format could also be used to display names for non-compound root tags too. A case that Minecraft seems not to use, and
nbtlib
does not support, but given the NBT spec there is no technical limitation:Anyway, allowing root to be a non-compound is a challenge for another day. But for now, removing the fake extra Compound would be very very nice!!!