rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.69k stars 12.49k forks source link

HashSet and BTreeSet difference() method argument lifetime overconstrained #73788

Open BartMassey opened 4 years ago

BartMassey commented 4 years ago

Consider this code.

use std::collections::HashSet as Set;
use std::ops::Deref;

fn exclude_set<'a>(given: &Set<&'a str>, to_exclude: &Set<&str>) -> Set<&'a str> {
    given.difference(to_exclude).cloned().collect()
}

fn main() {
    let given: Set<&str> = vec!["a", "b", "c"].into_iter().collect();
    let b = "b".to_string();
    let to_exclude: Set<&str> = vec![b.deref()].into_iter().collect();
    let excluded = exclude_set(&given, &to_exclude);
    drop(b);
    println!("{:?}", excluded);
}

This fails to compile.

error[E0621]: explicit lifetime required in the type of `to_exclude`
 --> src/main.rs:5:5
  |
4 | fn exclude_set<'a>(given: &Set<&'a str>, to_exclude: &Set<&str>) -> Set<&'a str> {
  |                                                      ---------- help: add explicit lifetime `'a` to the type of `to_exclude`: `&std::collections::HashSet<&'a str>`
5 |     given.difference(to_exclude).cloned().collect()
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ lifetime `'a` required

Now change exclude_set() so that to_exclude takes &Set<&'a str>>. Again, this will fail to compile.

error[E0505]: cannot move out of `b` because it is borrowed
  --> src/main.rs:13:10
   |
11 |     let to_exclude: Set<&str> = vec![b.deref()].into_iter().collect();
   |                                      - borrow of `b` occurs here
12 |     let excluded = exclude_set(&given, &to_exclude);
13 |     drop(b);
   |          ^ move out of `b` occurs here
14 |     println!("{:?}", excluded);
   |                      -------- borrow later used here

Commenting out the drop() will result in successful compile and execute.

No value from to_exclude should appear in the result of exclude_set(): set difference can only remove elements, not insert them. Thus, the drop should be harmless.

Some bodging around with code inspired by the implementation of std::collections::hash_set::Difference gets this, which compiles and executes.

use std::collections::HashSet as Set;
use std::ops::Deref;

fn exclude_set<'a>(
    given: &Set<&'a str>,
    to_exclude: &Set<&str>,
) -> Set<&'a str> {
    set_difference(given, to_exclude).collect()
}

fn main() {
    let given: Set<&str> = vec!["a", "b", "c"].into_iter().collect();
    let b = "b".to_string();
    let to_exclude: Set<&str> = vec![b.deref()].into_iter().collect();
    let excluded = exclude_set(&given, &to_exclude);
    drop(b);
    println!("{:?}", excluded);
}

struct SetDifference<'a, 'b, T: ?Sized> {
    iter: Vec<&'a T>,
    other: &'b Set<&'b T>,
}

impl<'a, 'b, T> Iterator for SetDifference<'a, 'b, T>
where
    T: std::hash::Hash + Eq + ?Sized,
{
    type Item = &'a T;

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            let elt: &'a T = self.iter.pop()?;
            if !self.other.contains(elt) {
                return Some(elt);
            }
        }
    }
}

fn set_difference<'a, 'b, T: ?Sized>(
    source: &Set<&'a T>,
    other: &'b Set<&T>,
) -> SetDifference<'a, 'b, T> {
    SetDifference {
        iter: source.iter().cloned().collect(),
        other,
    }
}

Sadly, I haven't figured out how to generalize the types and lifetimes for the new implementation in a way that the typechecker/borrowchecker is happy with.

The same issue exists for BTreeSet.

nbdd0121 commented 4 years ago

You're correct. std::collections::hash_set::Difference should take two lifetime arguments. However due to backward compatibility reason, declaration of Difference is unlikely to change.

BartMassey commented 4 years ago

@nbdd0121 I don't understand why weakening the lifetime constraint would be backward-incompatible? Presumably it would just allow more programs to compile?

nbdd0121 commented 4 years ago

It would change the number of lifetime parameters of Difference from 1 to 2. Doing so will break programs that spell out type of Difference explicitly.

BartMassey commented 4 years ago

@nbdd0121 Ah, good point.

tesuji commented 4 years ago

Just do the crater I think.

zjp-CN commented 1 year ago

Coming from https://tfpk.github.io/lifetimekata/chapter_9.html. [^lifetimekata]

I think the lifetime on difference is not overconstrained. Read the signature carefully where Self is HashSet<T, S>

fn difference<'a>(&'a self, other: &'a HashSet<T, S>) -> Difference<'a, T, S>

The api only applies to the same type T, i.e. for a referenced type, the lifetimes on both hashsets should be the same.

But for the example in this issue fn exclude_set<'a>(given: &Set<&'a str>, to_exclude: &Set<&str>) -> Set<&'a str> {

it says we want to diff two sets and return the resulting set the elements of which only come from the first one. The types of given and to_exclude is not the same T anymore.

So you need a new api for that:

fn difference_iter<'a, 'ref_, T>(
    set_a: &'ref_ HashSet<&'a T>,
    set_b: &'ref_ HashSet<&T>,
) -> impl Iterator<Item = &'a T> + 'ref_
where
    T: std::cmp::Eq + std::hash::Hash + ?Sized,
    'a: 'ref_,
{
    set_a.iter().filter(move |a| !set_b.contains(*a)).copied()
}

and the problem is solved.

[^lifetimekata]: for readers from lifetimekata, the solution is the same.

citizinf commented 2 months ago

@zjp-CN, I am also coming from lifetimekata and I am confused about if the difference_iter solution is actually relevant to this issue or not.

My understanding is that one would expect the lifetime of difference's output to be constrained only by the first argument (because the output can only contain references to stuff in the first argument). So when the compiler prohibits usage of that output because argument 2 has gone out of scope or otherwise ended its lifetime, that is unexpected and unnecessary for memory safety.

As for the original lifetimekata code, trying to work around this by specifying 2 unrelated lifetimes also doesn't work because the compiler expects both lifetimes to be the same, or at least one to explicitly outlive (subclass of) the other.

Since difference is tied to usage of - through std::ops::Sub, someone using that operator might run into this unexpected compiler error. So saying "lifetime on difference is overconstrained" is still a valid issue regardless of if a new API method could be added that the compiler accepts.