Add `Path.from_parts()` or similar to allow tuple constructor like `pathlib.Path`

MestreLion commented 3 years ago

I have this NBT Explorer-like printing and when re-implementing it using my generic tree walker I noticed a missing feature in Path that forced me to use a really inefficient constructor:

The generic tree walker yields a generic tuple containing each path component, similar to the pathlib.Path() API. And to create an nbtlib.Path from this tuple I had to resort to an expensive that iteratively (or recursively) creates a new Path for each component by concatenating the next component (def get_element(root, keys): return get_element(root[keys[0]], keys[1:]). (by the way, this concatenation is what led me to #146 , as each part can be an int (for List) or str (for Compound)

I'm sure there are better, or at least easier ways of doing this. Something like Path.from_accessors(), but skipping the parser by assuming each part is either an int or a str representing a single key.

MestreLion commented 3 years ago

I suggest 2 possible approaches:

Path.from_parts(), classmethod that takes a plain tuple* and builds the accessors without invoking the parser, assuming each part is a single, non-empty component
Allow the main constructor Path() to take such plain tuples (possibly invoking .from_parts(), right after the isinstance(path, Path) check (as Path is a tuple itself)

*: By "tuple" it should actually mean "any sequence which is not an string", so lists and similar iterables are allowed too. anyway, that's a minor detail.

vberlier commented 3 years ago

Right, so now I see why you wanted to concatenate ints too. But in this case subscripting is the operation you would want anyway. Assuming __add__ allows ints, you could implement _tree.get_element() using concatenation like this root + keys[0], but _tree.get_element(Path(), ["foo.bar", 2, "spam"]) would result in foo.bar[2].spam instead of "foo.bar"[2].spam. It's the kind of subtle error I was thinking about the other day. The problem isn't really that supporting ints in __add__ is ambiguous by itself, but that it behaves the same for operations that are semantically different. To me the bug would be more time-consuming to figure out than a clear TypeError.

Anyway, I agree that from_parts() would be pretty useful. You could probably currently implement it yourself with something like this (untested):

path = tuple.__new__(Path, (NamedKey(k) if isinstance(k, str) else ListIndex(k) for k in data.keys[:-1]))

Also a walk() method would definitely belong in nbtlib.

MestreLion commented 3 years ago

but _tree.get_element(Path(), ["foo.bar", 2, "spam"]) would result in foo.bar[2].spam instead of "foo.bar"[2].spam. It's the kind of subtle error I was thinking about the other day.

Yeah, but since it came from a tuple, using a method called .from_parts(), it's pretty self-evident that "foo.bar" must indeed be a single component. The tuple/list format makes it clear that each element is a single key. If you want to parse each element, then use Path's constructor or concatenation. .from_parts() is for adults, it does have a contract/assumption embedded on it for the sake of performance.

The problem isn't really that supporting ints in __add__ is ambiguous by itself, but that it behaves the same for operations that are semantically different.

The operations semantically are different, I agree, but int only behaves the same because because of a limitation on integers: they can't "express" more than a single component. Strings and Paths can, so they do when the operation allows.

For example, let's say we hypothetically allow tuples/lists in __add__ too, it could (and maybe it should) mean "parse every element", so Path() + ["foo.bar", 2, "spam"] == Path("foo.bar[2].spam"). Probably not useful, but still consistent:

"High-level" operations, such as Path() and +, do some parsing/processing on its input. A single string may become more than one key, so a tuple may become more than its len(). An int would too, if it could, and Path keeps its len()
"Low-level" operations, such as .from_accessors() of .from_parts(), are internal tools exposed to the API for the sake of performance. They make assumptions and are more restrictive about their data. They skip the parser.

MestreLion commented 3 years ago

in this scenario, [] indexing (not slicing) is a low-level tool too, as it also skips the parser by contract, assuming the index is always a single element for leaf values such as strings and integers. For a collection such as another Path, it merely loops and adds each part without any processing (the same for a hipotetical tuple. But I'm not suggesting you allow tuples to + or [] or even (), ok? Just using it as a mental exercise for testing API consistency. I'm more than happy with just .from_parts()

MestreLion commented 3 years ago

Anyway, I agree that from_parts() would be pretty useful. You could probably currently implement it yourself with something like this (untested):
path = tuple.__new__(Path, (NamedKey(k) if isinstance(k, str) else ListIndex(k) for k in data.keys[:-1]))

Hum, didn't know about Path's internals to realize it would be so simple. I'll take a look at it and test to craft an actual PR.

Also a walk() method would definitely belong in nbtlib.

Feel free to grab my tree.py, or some of its usage in my nbt (a wrapper for nbtlib). It's a new addition that came out of some experiments in generic crawlers I'm particularly proud of. Still WIP, soon I'll add a useless os.walk() wrapper using it just to test how flexible it is. print_tree is cute, take a look at its usage in nbt.nbt_explorer()

vberlier commented 3 years ago

Yup sorry if this wasn't really clear, the first paragraph wasn't really related to from_parts() or anything. It was just a tangent about the error you could have run into if you tried implementing your get_element function with concatenation rather than indexing. I was just trying to say that the canonical operations for extending paths are path + path and path[key], and that for concatenation the operands can be subjected to auto-conversion that can make it look like they do the same thing.

I'm completely on board with from_parts. I'm gonna look into your patches in more details but this looks really neat!

MestreLion commented 3 years ago

Yup sorry if this wasn't really clear, the first paragraph wasn't really related to from_parts() or anything. It was just a tangent [...]

Nah, you were clear enough I could infer you approved .from_parts() and we were just discussing a side-track.

for concatenation the operands can be subjected to auto-conversion that can make it look like they do the same thing.

Yeah, I carefully chose to use indexing instead of concatenation in get_element because of that difference. I was aware since you told me in #146 . The need for .from_parts() is just to avoid the looping/recursion (or perform it in Path), as I was sure Path had a better way to craft itself out of a tuple.

I'm completely on board with from_parts. I'm gonna look into your patches in more details but this looks really neat!

Well... I didn't PR this one yet... will do so in the next days

vberlier commented 3 years ago

Right, makes sense!

Well... I didn't PR this one yet... will do so in the next days

By "patches" I was actually referring to your nbt.py file containing your extra utilities and where you monkey-match a few nbtlib things.

vberlier / nbtlib

Add `Path.from_parts()` or similar to allow tuple constructor like `pathlib.Path` #157