Open tapeinosyne opened 7 years ago
I like the idea of a StringTrie
, but I prefer the AsKey
method. I don't like AsRef<[u8]>
so much because it implies that you're in a way converting something to a byte slice, rather than it is to some extent inherently a byte slice and you're just borrowing it, and that doesn't seem to mesh with the way Rust prefers Borrow
over AsRef
for the extra implicit invariant in, say, HashMap
. I would insist that AsKey
be concerned with abstract properties - it should definitely and absolutely be the case that if the byte slices extracted from two keys are equal, then the keys themselves should be equal. This is the reason I originally picked Borrow
. Open question: as we're switching from Borrow
to AsKey
, should the byte slices also be ordered the same? Does that even matter? The "ordering" in a QP-trie is nybble-based so it's not really possible to re-order based on some custom ordering. Interestingly enough, String
s are Ord
via simply redirecting to the internal Vec<u8>
(it's just a #[derive]
.)
Perhaps Break
could be unified with AsKey
? To a certain extent Break
was a band-aid on the way that Borrow<[u8]>
didn't express everything that you need for a key to be a key, in a trie. On the other hand, you only need Break
for grabbing correct prefixes, and the trie works fine as a key-value map with a key which doesn't implement Break
. Thoughts?
@sdleffler I would insist that AsKey be concerned with abstract properties - it should definitely and absolutely be the case that if the byte slices extracted from two keys are equal, then the keys themselves should be equal. This is the reason I originally picked Borrow.
Definitely so, although there might be a slight misunderstanding here: in the opening message, I considered equality of byte slices (“bytewise equality”) a concrete property as opposed to abstract; with abstract, I meant things like Eq
. For example, two caseless strings or (I think) standard library Path
s can be equal under Eq
even if their byte composition is different; nonetheless qp-tries would only see unequal bytes, and associate the two keys to different entries. This really is a matter intrinsic to qp-tries – they operate as byte-based automata, not symbolic ones.
Perhaps you meant instead that keys should be agnostic to ownership, which is the mandate of Borrow
? That's the case for all proposed impls, but it would be perfectly reasonable to make it a requirement for all implementors. Such a requirement can be more formally specified by retaining the Borrow
relationship between the argument type at the call site and AsKey::Borrowed
, as in:
pub fn get<'a, Q: ?Sized>(&self, key: &'a Q) -> Option<&V>
where
Q: Borrow<K::Borrowed>
Open question: as we're switching from Borrow to AsKey, should the byte slices also be ordered the same? Does that even matter? The "ordering" in a QP-trie is nybble-based so it's not really possible to re-order based on some custom ordering. Interestingly enough, Strings are Ord via simply redirecting to the internal Vec
(it's just a #[derive].)
It's as you say: there can be no “canonical” order over UTF-8 strings other than the lexicographic order on bytes, which is the one we already use; going beyond requires collation and locales, but even those only concern islands of codepoints (because there is no meaningful ordering relationship between symbols from different scripts, or between letters and punctuation, and so on).
Perhaps Break could be unified with AsKey? To a certain extent Break was a band-aid on the way that Borrow<[u8]> didn't express everything that you need for a key to be a key, in a trie. On the other hand, you only need Break for grabbing correct prefixes, and the trie works fine as a key-value map with a key which doesn't implement Break.
I agree that it's reasonable to isolate different capabilities to different traits, and similarly apply bounds to affected methods only. Moreover, the notion of Break
may prove useful later on, if we were admit other types as keys.
@tapeinosyne sorry for the late response! I have to admit that it never crossed my mind when I first wrote this that the equality/ordering between keys is strictly "bytewise" - mostly because when I wrote this, I was intending to use it strictly for bytes. The idea of using it for strings and other types came later (although I now find myself wanting to use it for paths.) I think it's definitely a must that keys be agnostic to ownership in the same manner as Borrow
, with respect to functions such as get
and iter_prefix
.
@tapeinosyne on the topic of Path
s, I'd like to ask your opinion - do you think it's reasonable behavior to implement Break
for Path
s on Component
boundaries?
I'd say so, certainly. (Although Path
keys probably won't see much use.)
Going back to AsKey
itself, I'd be happy to revisit #14 if you're up for a review.
Gladly! Can't promise to be the most responsive for the next month or so but if you submit a PR I'll do my best to get to it in a timely manner. You want me to take a look at it as-is or do you have changes to make first?
By all means, take your time. (Hardly unreasonable, innit, considering that it took me four months to reply "yes" above!) The PR can already be reviewed for issues of substances with AsKey
; the remaining concerns should be minor (additional impls, updating docs/readme, style) and quickly addressed.
I'll update the opening message of #14 to better reflect this.
@sdleffler, would you be available for revisiting this issue? I would be happy to revise or update #14 as we like.
I would! There were a few bugfixes since #14 was submitted - would you mind rebasing #14? It honestly looks good enough to me now that I've finally had the time to go over it that I don't think I can argue for much in the way of revision. Let's get it rebased and then I'll check it out and probably merge it.
Certainly. I'll need to read it over myself—it's been a while since I wrote it.
A question: do we want to deprecate the BString
wrapper, or baldly remove it?
Let's deprecate it for now. If it gets in the way of the API change, then it can go.
The reason I have left them midway is that, hands-on, I rather feel that AsKey
may win us too little over a simpler AsRef<[u8]> + Break
bound. My primary motivator for AsKey
was supporting more key types, but:
OsStr
and thus std::path::Path
— as I just learned — do not offer a byte view, by design. (I had assumed they did.)AsKey
, such as URLs or domains[1], might just implement AsRef<[u8]>
as a matter of course.All of which leaves AsKey
looking surplus to requirements. How do you feel about it?
[1] Notably, the knot
recursive DNS solver uses qp-tries.
Currently, string keys are rather unwieldy. They cannot be used directly in QP-tries, and must either be wrapped individually or funnelled through ancillary inherent methods – loose methods which exist as duplicates without a supporting trait.
Primarily, the issue arises from the requirement that keys be
Borrow<[u8]>
:Borrow
is concerned with abstract properties such asEq
orHash
, but qp-tries manipulate keys on a strictly structural basis, byte by byte. From this stems a certain tension that can be felt most clearly in the public API, where users must rely on different inherent methods depending on the key type.I outline here three possible paths to better key ergonomics, namely:
AsKey
trait,AsRef<[u8]>
rather thanBorrow<[u8]>
,Of the three, I believe the first would be the best choice, as it not only guarantees ergonomic string keys but also gives us the opportunity to admit more key types. I will follow up with a PR to get a better feel for the proposal.
Introduce an
AsKey
traitShift the bound on keys from
Borrow
to a public traitAsKey
(orRadical
, if we're feeling whimsical), which provides both a slice view – akin toBorrow
itself – and a byte view; the slice view subsumesBorrow
and the byte view makesBStr
wrappers redundant. (Note that qp-tries already requireBreak
as a non-std trait.) As an additional benefit,AsKey
could later be implemented for more types expected of a radix tree – notably integers, integer vectors and arrays, and possibly tuples.The basic idea can be illustrated thus:
The main disadvantage, here, is that coherence rules prevent us from providing much in the way of blanket impls: we can cover concrete types from the standard library, and that is about it. Nevertheless such impls should suffice, as users can rely on conversion traits and deref coercions at the call site.
(In order to provide blanket impls, we would need to parametrize the trait, turning it into
AsKey<Borrowed>
, and trackBorrowed
withPhantomData
internally. That should allow us to implement the trait forK : Borrow<[u8]>
andK : Borrow<str>
without overlap, by parametrizing each impl with its respective borrowed form. Not complicated, just… cluttered.)Make keys
AsRef<[u8]>
Relax the bounds from
Borrow<[u8]>
toAsRef<[u8]>
, allowing tries to accept string keys without further changes. This is by and large analogous to what is done now, internally, as we only ever work with byte slices. Other byte-based tries published on Crates.io (fst
comes to mind) also admitAsRef
keys.To justify the transition, I will note that
Borrow
is meant to enforce abstract properties, forming a contract between things that – be they owned or borrowed – hash the same way, remain equal underEq
, have the same order underOrd
, and so on. However, QP-tries need only rely on structural equality, i.e. they need only know the actual bytes that form a key, and nothing beyond that. From this perspective,AsRef<[u8]>
is better tailored to our needs thanBorrow
, as its purpose is to offer a cheap reference-to-reference conversion to the parameter, and a reference to[u8]
can be little else than an agnostic byte view.Having said that, the fact that trie keys would not follow
Eq
may prove confusing, and the new bounds should be clearly documented with explanations and examples.Wrap the trie, not the keys
Given that keys are homogeneous, there is little benefit in wrapping them individually and using ad-hoc methods to reduce boilerplate. Rather, we can introduce a newtype
StringTrie<V>
, wrapping a plain qp-trie overBorrow<[u8]>
keys. Trie methods are moved to a trait, whichStringTrie
can implement by delegation to the inner trie – with the exception of prefix lookup, whereStringTrie
requires index validation.There are degrees of sophistication to this approach and how generic it should be, but the resulting interface should allow users to forgo wrappers after trie instantiation:
This is the least impactful approach, in that it does not ultimately alter bounds, but merely lifts the newtype to the outer structure of the trie. However, it does not address the issues underlying string keys.