ibraheemdev / matchit

A high performance, zero-copy URL router.
https://docs.rs/matchit
MIT License
344 stars 35 forks source link

use reference counted prefix for less lifetimes on params? #28

Open fakeshadow opened 1 year ago

fakeshadow commented 1 year ago
struct Node {
    prefix: Arc<[u8]>
}

struct Param<'v> {
    key: BytesStr,
    value: &'v u8
}

struct BytesStr(Arc<[u8]>);

impl ByteStr {
    fn as_str(&self) -> &str {
        std::str::from_utf8(&self.0[1..]).unwrap()
    }
}

A trade off between less lifetime and reference counted overhead + an extra public type. Current Param(s) type are pretty hard to use if not consumed inline. From what I see most people would do an extra heap allocation to erase the lifetime(s) so in that case this would likely be cheaper.

ibraheemdev commented 1 year ago

Wouldn't most people allocate again anyways (if its used in a Path extractor for example)? Unless it used the bytes crate maybe.

fakeshadow commented 1 year ago

On a further look a zero copy value is also possible in form like this:

struct Param {
    key: BytesStr,
    value: BytesStr
}
impl Node {
    fn at(&self, path: impl Into<ByteStr>) {}
}

This would hit the perf even further when given a str to match with one extra copy and a couple of reference counted slicing on it. But in real world http can possibly be used to hand out an already existing ByteStr to enable zero copy. Related code: https://github.com/hyperium/http/blob/34dc1ccb4786c8026e57404fa5e71275978a55f3/src/uri/path.rs#L13

dpc commented 1 year ago

I like that Params are zero-copy, and I haven't noticed usability issues with the lifetimes. In my toy astra/mathcing project I barely every write down the lifetimes: https://github.com/dpc/htmx-sorta/blob/5e9d2fe026ad9e2baae20368e59c7237c0671161/src/routes.rs#L64

The only problem I've noticed is the one described in #40 and describe there a way to delay any allocations to the point where actual path arguments are known to exist. In the (I assume common case) where there is None, no allocations need to happen.I use Box there, but Arc<_>, Cow<_> etc would work fine as well.

ibraheemdev commented 1 year ago

http/hyper already uses Bytes internally, though not fully exposed in it's public API, so we might still be able to be zero-copy in the common use case.

fakeshadow commented 1 year ago

It's when you use it in a complicated web framework abstraction the lifetime pollution becomes apparent. This is how matchit is used in axum: https://github.com/tokio-rs/axum/blob/368c3ee08fc3896358d3bd2bfc8cc67f2925c6ef/axum/src/routing/url_params.rs#L24 There are mainly two reason for less practical use of reference based param in this case: Exposing lifetime in public API like Service<Request<'a>> is harder to use for library consumer and typical type map for runtime type casting require Any which requires 'static lifetime. In xitca I use a fork of matchit with ref count based string type and inline optimized small string which is both easier to use in complicated abstraction and faster with less allocation.

http/hyper already uses Bytes internally, though not fully exposed in it's public API, so we might still be able to be zero-copy in the common use case.

There is a related issue: https://github.com/hyperium/http/issues/484 though it's unlikely to happen before some form of widely used ref count string type exists (either in std or 3rd party).

ibraheemdev commented 1 year ago

I guess the most generic way of supporting this would be to return a Range<usize> for values instead of a string slice and punting the choice of string type to users, though you would still have to re-allocate a Params list. For lifetime-less keys though we'd still have to use some sort of Arc<str> internally.

dpc commented 1 year ago

It's when you use it in a complicated web framework abstraction the lifetime pollution becomes apparent. This is how matchit is used in axum: https://github.com/tokio-rs/axum/blob/368c3ee08fc3896358d3bd2bfc8cc67f2925c6ef/axum/src/routing/url_params.rs#L24

That doesn't look bad. Seems it comes down to "Params is holding references to stuff and has a lifetime".

Anyway, I agree that Params is not best for general purpose consumption. My observation with #40 is that can do route matching and get a cheap Params, and then you convert it to something else and due to the knowledge of what was parsed it might be cheaper (free in many cases) than cloning path/url upfront.

I thought that maybe matchit should support it internally, but now it occurred to me that it would be already possible to do it externally, if matchit provided some facilities for it (expose some internals, or some methods to iterate over it). What you actually convert it to does not even have to be one and only one thing. You have your favorite short-string library etc., you can use it.

These two effort could be combined. Matching could support internally some conversions to representations that are more useful for general purpose code at the cost of extra allocations etc, and then also enable developers converting to whatever they want as well.

If hyper can expose url as Bytes etc. that would change the situation and make more things possible.