thomcc / arcstr

Better reference counted strings for Rust
Apache License 2.0
114 stars 18 forks source link

`value.substr(range)` on empty range #28

Closed AngelOfSol closed 3 years ago

AngelOfSol commented 3 years ago

I've been using ArcStr/Substr as a way to store my backing string buffers for a DSL I've written a compiler for. It was fairly easy except for one small issue with the way substr works on empty ranges.

  let x = Substr::from("asdf").substr(1..1);
  assert_eq!(x.range(), 0..0);

Hooking up to nom made this inconvenient, because it expects empty ranges to be properly offsetted for calculations. I ended up making my own substr function to get around this:


#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct Input(pub Substr);

impl Input {
    pub fn substr<R: RangeBounds<usize>>(&self, range: R) -> Self {
        let start = match range.start_bound() {
            std::ops::Bound::Included(x) => *x,
            std::ops::Bound::Excluded(x) => *x + 1,
            std::ops::Bound::Unbounded => 0,
        };
        let end = match range.end_bound() {
            std::ops::Bound::Included(x) => *x + 1,
            std::ops::Bound::Excluded(x) => *x,
            std::ops::Bound::Unbounded => self.0.len(),
        };

        // in the case of an empty substr, we want to create a blank reference
        // to a new str, with the empty range
        if start == end && start <= self.0.len() {
            let idx = self.0.range().start + start;
            Input(unsafe { Substr::from_parts_unchecked(self.0.parent().clone(), idx..idx) })
        } else {
            Input(self.0.substr(range))
        }
    }
}

The solution I have currently works for me (I'm ok with the cost of the parent().clone() even though the string is empty), but I'm wondering if we could integrate a solution or alternative upstream.

thomcc commented 3 years ago

Hm, you mean you need to get access to the true backing range of a Substr?

Sorry if I'm not quite following (I'm sort of on vacation this month, although I'm not entirely away)

thomcc commented 3 years ago

Oh, I see. It's nice in some sense that we avoid all overhead for empty ranges, but I think consistency here is probably more worthwhile, at least in the substr case.

I can work on a patch for this in a while, but if you submitted a PR that addressed the various cases (there are a few methods that need to change, at the very least from_parts in addition to substr), I don't see a reason to reject it.

AngelOfSol commented 3 years ago

Also, I don't know if you'd be interested in a PR introducing a nom feature that implements all the required traits for nom to work with substr or not, but since I've already written the code, I'd be happy to do it.

Heres the code reference

thomcc commented 3 years ago

My main hesitation with a nom feature is around semver compatibility. arcstr is 1.x and I'd like to keep it at 1.x unless there's a really major need for an update. nom is 7.x, and plausibly could bump to 8.x in the future.

If you have a public dependency, and you bump its major version, you need to bump your major version, so I'd rather keep public deps to things with stable APIs, as much of a pain in the ass as that might be.

AngelOfSol commented 3 years ago

Makes sense. The manual implementation only costs a wrapper type, assuming I can get empty-range accurate substrs in some form. For now I have the hacky unsafe stuff so it works for me regardless.