rust-lang / libs-team

The home of the library team
Apache License 2.0
123 stars 19 forks source link

Add `slice::{trim, trim_start, trim_end}` (+ mutable counterparts) #414

Closed bluebear94 closed 1 month ago

bluebear94 commented 2 months ago

Proposal

Problem statement

In Typst, we need to inspect the part of a slice from the first “non-ignorant” element to the last (see typst/typst#4544).

Motivating examples or use cases

The most readable solution is

let start_idx = fragments.iter().take_while(|f| f.is_ignorant()).count();
let end_skip = fragments.iter().rev().take_while(|f| f.is_ignorant()).count();
let end_idx = fragments.len() - end_skip;
let fragments_inner = &mut fragments[start_idx..end_idx];

but this doesn’t optimize as well as the following:

let start_idx = fragments
    .iter()
    .position(|f| !f.is_ignorant())
    .unwrap_or(fragments.len());
let fragments_trailing = &mut fragments[start_idx..];
let mut end_idx = fragments_trailing.len();
while end_idx > 0 {
    if !fragments_trailing[end_idx - 1].is_ignorant() {
        break;
    }
    end_idx -= 1;
}
let fragments_inner = &mut fragments_trailing[..end_idx];

Alternatively, we can loop with split_first or split_last.

Also see an idealized example on the playground for easy ASM viewing.

Trying to loop on slice matches gives a borrow checker error when the subslice needs to be mutable:

let fragments_inner: &mut [MathFragment] = {
    let mut fragments_inner = fragments.as_mut_slice();
    while let [f, rest @ ..] = fragments_inner {
        if f.is_ignorant() {
            fragments_inner = rest;
        } else {
            break;
        }
    }
    while let [rest @ .., f] = fragments_inner { // <- cannot borrow `fragments_inner[..]` as mutable more than once as a time
        if f.is_ignorant() {
            fragments_inner = rest;
        } else {
            break;
        }
    }
    fragments_inner // <- cannot move out of `fragments_inner` because it is borrowed
};

Solution sketch

Add slice::{trim, trim_start, trim_end} functions, as well as their mutable counterparts:

// `f` gives the predicate for elements to trim
impl<'a, T> &'a [T] {
    pub fn trim<F>(&self, f: F) -> &'a [T]
            where F: Fn(&T) -> bool;
    pub fn trim_start<F>(&self, f: F) -> &'a [T]
            where F: Fn(&T) -> bool;
    pub fn trim_end<F>(&self, f: F) -> &'a [T]
            where F: Fn(&T) -> bool;
}
impl<'a, T> &'a mut [T] {
    pub fn trim_mut<F>(&self, f: F) -> &'a mut [T]
            where F: Fn(&T) -> bool;
    pub fn trim_start_mut<F>(&self, f: F) -> &'a mut [T]
            where F: Fn(&T) -> bool;
    pub fn trim_end_mut<F>(&self, f: F) -> &'a mut [T]
            where F: Fn(&T) -> bool;
}

Alternatives

We could omit trim and trim_mut, since these can be expressed as a sequence of trim_start and trim_end.

We could add a trait for peekable iterators (#176), allowing this operation to be implemented using iterator-based methods.

We could omit the _mut methods, but this wouldn’t cover the case when you’d later need to mutate the subslice. Alternatively, we could add _mut variants for the trim_ascii methods.

We could bound F by FnMut(&T) -> bool instead of Fn(&T) -> bool, but that would make the order of element checking observable. This isn’t a problem for trim_start and trim_end, but it is for trim. Alternatively, we could do this for the methods other than trim and trim_mut.

For the _mut methods, we could pass in a mut reference to the callback.

Links and related work

Issue: typst/typst#4544

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

zachs18 commented 2 months ago

This seems to overlap somewhat with the SlicePattern API which should be considered, and perhaps this could just be included in that, since str::trim_*_matches are similar to this API, and a method using SlicePattern with the proposed API could be analogous to that.


If this is not included in the SlicePattern API:

I think these should be implemented on [T] directly, rather than on &(mut) [T]. This would match existing APIs like slice::split and str::trim_matches. Also, the current signatures of trim_*_mut are not implementable (or are unsound), since you cannot subslice a &'long mut U from a &'short &'long mut U.

I think the closure passed to trim_*_mut should take &mut T, so we don't have a repeat of Vec::retain vs Vec::retain_mut.

Concretely:

// `f` gives the predicate for elements to trim
impl<T> [T] {
    pub fn trim<F>(&self, f: F) -> &[T]
            where F: Fn(&T) -> bool;
    pub fn trim_start<F>(&self, f: F) -> &[T]
            where F: Fn(&T) -> bool;
    pub fn trim_end<F>(&self, f: F) -> &[T]
            where F: Fn(&T) -> bool;

    pub fn trim_mut<F>(&mut self, f: F) -> &mut [T]
            where F: Fn(&mut T) -> bool;
    pub fn trim_start_mut<F>(&mut self, f: F) -> &mut [T]
            where F: Fn(&mut T) -> bool;
    pub fn trim_end_mut<F>(&mut self, f: F) -> &mut [T]
            where F: Fn(&mut T) -> bool;
}
Amanieu commented 1 month ago

We discussed this in the libs-api meeting today. We are happy to accept this API with several changes.

These methods should follow the trim_matches naming scheme like the methods on str since they take an argument while the bare trim methods on str don't and will trim whitespace.

Additionally, these should use the SlicePattern API, which will need to be extended to support using a Fn as a slice pattern.