RFC: Boxes vs Containers - Naming schemes for element access

Kimundi commented 11 years ago

Right now there is much confusion and inconsistency about the naming of functions that deal with elements in types that wrap one or more of them. We need to find a consistent way to deal with that.

First, some terminology:

Container - Can contain at most N elements (with N effectively infinite).
Examples: Vector, Hash Map, Linked List.
Box - Can contain at most 1 element.
Examples: Option, Cell, ~T, ARC.

I'm also defining a elementary set of operations you can perform on containers and boxes in regard to an arbitrary chosen element E (named in all-caps to differentiate from actual function names below):

REF: Return a reference to the element - (&'a self) -> &'a E
REF_MUT: Return a mutable reference to the element - (&'a mut self) -> &'a mut E
TAKE: Return a element, removing it from the type - (&mut self) -> E
UNWRAP: Return a element, destroying the type in the process - (self) -> E

In regard to naming schemes, I see a number of possibilities:

Unify boxes and containers

This would be the globally most consistent and reusable way, but also the most ugly one. We define a set of functions for working with an arbitrary 'next' element on an container. For example in stack-terminology:

fn top(&'a self) -> &'a E - REF, fail if empty
fn peek(&'a self) -> Option<&'a E> - REF or None
fn top_mut(&'a mut self) -> &'a mut E - REF_MUT, fail if empty
fn peek_mut(&'a mut self) -> Option<&'a mut E> - REF_MUT or None
fn pop(&mut self) -> E - TAKE, fail if empty
fn pop_opt(&mut self) -> Option<E> - TAKE or None
fn pop_consume(self) -> E - UNWRAP, fail if empty
fn pop_consume_opt(self) -> Option<E> - UNWRAP or None.
Separate boxes and containers, consistent box naming scheme

Containers and Boxes are separate concepts. Containers are build around a terminology and operations that make most sense to them. They probably wouldn't support UNWRAP and might not support the others either, depending on their specific semantics.

Boxes gain their own more concise and strictly consistent terminology, based around the box metaphor:

"Get a reference to the content, but keep it in the box"
- fn get(&'a self) -> &'a E - REF, fail if empty
- fn get_opt(&'a self) -> Option<&'a E> - REF or None
- fn get_mut(&'a mut self) -> &'a mut E - REF_MUT, fail if empty
- fn get_mut_opt(&'a mut self) -> Option<&'a mut E> - REF_MUT or None
"Take the content out of the box, keeping the box intact but empty"
- fn take(&mut self) -> E - TAKE, fail if empty
- fn take_opt(&mut self) -> Option<E> - TAKE or None
"Destructively unwrap the box to get to the content"
- fn unwrap(self) -> E - UNWRAP, fail if empty
- fn unwrap_opt(self) -> Option<E> - UNWRAP or None.
  Separate boxes and containers, convenient box naming scheme

This one is quite similar to the prior one, but shuffles names and definitions around to provide a nicer set of function names for the common operations:

"Get a reference to the content, but keep it in the box"
- fn get(&'a self) -> &'a E - REF, fail if empty
- fn get_opt(&'a self) -> Option<&'a E> - REF or None
- fn get_mut(&'a mut self) -> &'a mut E - REF_MUT, fail if empty
- fn get_mut_opt(&'a mut self) -> Option<&'a mut E> - REF_MUT or None
"Take the content out of the box if any, keeping the box intact but empty"
- fn take(&mut self) -> Option<E> - TAKE or None
"Destructively unwrap the box to get to the content"
- fn unwrap(self) -> E - UNWRAP, fail if empty

The last one is probably the most convenient as it's optimized for common "I know it is not None" Option usecases:

opt.get() - " Do something with the content of a Option, but keep it in place."
opt.unwrap() - "Move the content out of a Option, throw the Option away"
opt.clone().unwrap() or opt.get().clone() - "Get a copy of the content"
opt.take().unwrap() - "Move the content out of a Option, replace the Option with None"

Hm, this writeup is only half as comprehensive and cohesive as I wanted it to be, I hope it's still understandable.

thestinger commented 11 years ago

I think it's silly to have redundant functions/methods doing fail!() internally every time there's a function/method returning Option. In many cases it doubles the size of the API, making it harder to maintain and learn.

It would be much simpler to just use the unwrap method on Option as necessary.

Kimundi commented 11 years ago

@thestinger: I agree in principle, but the problem here is that methods on Option that return an Option for the element are kinda useless... If we only have unwrap(), and have to chain that to every other possible call its going to be cumbersome to use.

thestinger commented 11 years ago

I'm talking about everything returning an Option that's not an Option itself. The methods should only have to be defined once on Option, we really don't need get, expect and so on to go along with methods like find on every Map, and have them in every variety find comes in.

bblum commented 11 years ago

I've been mulling this over. My preference at this point is to do away with the "ref" and "consume" suffixes, and make "get" mean reference, and "unwrap" mean by value. So, today's option get becomes unwrap, unwrap stays the same, get_ref turns into get, map stays as map, map_consume turns into map_unwrap, and consume_iter turns into unwrap_iter.

bluss commented 11 years ago

bblum, unwrap fails on None, map_unwrap can't do that, it's not really useful.

bblum commented 11 years ago

My thought about that was "unwrap means might fail", but "map means won't fail" with higher precedence. I dunno, maybe it is better to keep that as consume, but I'd like to see one or the other word go away. Would it be any better to replace all "unwrap" with "consume"?

erickt commented 11 years ago

@bblum: I think we should prefer functions like .map() to consume self if we can, and let the caller use .clone() if they need a copy. That feels more rust-y to me. I've got an old patch series waiting for the recently landed fix for moving out of self that implements this for options, results, and vecs. I'll look into reviving it.

On Tue, Jul 23, 2013 at 8:01 PM, Ben Blum notifications@github.com wrote:

My thought about that was "unwrap means might fail", but "map means won't fail" with higher precedence. I dunno, maybe it is better to keep that as consume, but I'd like to see one or the other word go away. Would it be any better to replace all "unwrap" with "consume"?

— Reply to this email directly or view it on GitHubhttps://github.com/mozilla/rust/issues/7887#issuecomment-21460848 .

erickt commented 11 years ago

@bblum: looking back at my patch, I also added .map_ref() in the case you didn't want to pay the cost to clone the container. Hows does that pattern sound?

bblum commented 11 years ago

That (foo and foo_ref) was my original leaning, but then I thought about .iter(). The vast majority of the time, (I expect) people will want ref_iter() instead of consume_iter(). But maybe the syntax sugar will save us in that case?

The other qualm I had with that idea is that ARC unwrap should have a longer, more ominous name than get. But maybe we can keep the name unwrap for Option and for ARC, as a special case of implicit-consume that additionally means "might fail, be careful".

bluss commented 11 years ago

Keep unwrap for the case where you get "the whole content" out of a thing. I agree .map, .map_mut sound good for mapping by reference. What about .map_value for the consume case? (EDIT: I talk only about Boxes, like Option).

Kimundi, I don't really agree with any of your alternatives. Separate Option from the containers. Option is a building block for the latter.

Kimundi commented 11 years ago

A note to map(), map_mut() etc. Could we not just use an external iterator for that? Once it works properly it would end up as box.map() and box.iter_mut().map()

bluss commented 11 years ago

Not for Option, since it's so central, it needs a lot of sugar

erickt commented 11 years ago

The challenge which approach do we pick for the shortest name? I converted .map() to consume because I found most use cases could be implemented with .map_consume() and I figured I'd change the shorter name to be the more efficient one. But since reference iterators are more general, it makes sense to keep .iter() returning references.

What if we made the container's .map() consume, and use .iter().map() if you want the references? With mostly-working default methods, we could lift these implementations into some traits like this:

trait ConsumableIterator<T> {
    fn map<U>(&self, f: &fn(T) -> U) -> ConsumableIterator<U>;
}

trait FromIterator<T> {
    static fn from_consume_iter(i: ConsumableIterator<T>) -> Self;
}

trait Mappable<T>: ConsumableIterator<T> {
    fn map<U, Container: FromIterator<U>>(self, f: &fn(T) -> U) -> Container<U> {
        FromIterator::from_consume_iter(self.consume_iter().map(f))
}

impl<T> Mappable<T> for ~[T] {}

thestinger commented 11 years ago

I don't think consuming is usually what you want for containers, and lots of them won't be able to do it.

erickt commented 11 years ago

@thestinger: How so? I'm using the term consume in this case to mean transferring the ownership of the contained elements from one container to another. Any container that allows for removing of elements (e.g. not bloom filters) should be able to support this kind of operation.

thestinger commented 11 years ago

Consume would often be O(nlogn) with a tree instead of O(n), and not possible with persistent containers.

I'm not sure if you would be able to write consume for a container like a list (chunked or not), without unsafe code that's likely to break if drop flags are removed.

erickt commented 11 years ago

@thestinger: By O(nlogn), are you counting both the deconstruction and reconstruction of a container? If just deconstruction, we can implement tree consumers in Rust in O(n) if take advantage of unique pointers. It allows us to violate the tree algorithm invariants to unfold the tree structure. Here's a simple O(2n) version to consume the tree:

enum Tree<T> {
    Node(T, ~Tree<T>, ~Tree<T>),
    Empty
}

impl<T> Tree<T> {
    fn consume_iter(self) -> ConsumeIterator<T> {
        fn aux(node: Tree<T>, mut values: ~[T]) {
            match node {
                Empty => values,
                Node(value, left, right) => {
                    values = aux(left);
                    values.push(value);
                    values = aux(right);
                    values
                }
            }
        }
        aux(self, ~[]).consume_iter()
    }
}

It wouldn't be that hard to optimize away values, but we'd still need a stack to track where we are in the tree.

Regarding persistent containers, I totally agree, and it makes total sense for their .map() to return references.

Regarding writing consume for a mutable singly linked list, you would need an intermediate structure that stores the consumed items in reverse order. All you need to do then is unreverse them. It'd also be O(2n). We could use dark magic to reverse the pointer directions to get it down to O(n) but I'm not sure if it'd be worth the unsafety.

thestinger commented 11 years ago

@erickt: I was thinking about the complexity of consuming and building up a new tree compared to just doing the operation in-place. Default methods like fold would likely be based on iter and I think by-reference is usually going to be desired.

graydon commented 11 years ago

I added a consume iterator to treemap recently.

graydon commented 11 years ago

(no unsafe, o(n), very tidy!)

graydon commented 11 years ago

More to the point of the bug, I think i prefer no suffix means by-ref, we use the suffix _move rather than take, consume or unwrap, and partial functions are funneled into either conditions or options with an _opt suffix.

erickt commented 11 years ago

@graydon: So .map_move()? I can live with that.

Speaking towards the bug, do we need partial forms of functions like .take()? We're starting to hit a combinatorial explosion of different function variants: by-ref / by-move / mut / default / map / expect. Instead I'd prefer if we just have up to 3 forms when necessary: by-move, by-immutable-ref, and by-mutable-ref. If we want a partial result, we can call Some(~"foo").map(|x| x.to_str()).expect(~"I expected some foo"). It does result in an extra Option allocation, but that should be both relatively cheap, and easy to optimize away with enough #[inline] attributes.

Kimundi commented 11 years ago

Isn't take_unwrap() used quite a lot in the codebase? Having an explicit take() would allow to construct that call as take().unwrap().

nikomatsakis commented 11 years ago

One observation: Persistent collections will not be able to support a "consume" operation.

Kimundi commented 11 years ago

@nikomatsakis: Yeah, I figured you'd need a separate trait for each of these functions.

brson commented 11 years ago

Does anybody have a succinct explanation of the current consensus on this subject for https://github.com/mozilla/rust/wiki/Doc-detailed-release-notes?

Kimundi commented 11 years ago

If we do #9355 (which I'd be totally in favor for), it would pretty much reduce all discussion here to get vs unwrap.

I'm kinda embarrassed that I never got the idea that you might not need three different versions of accessors for Option... ;)

thestinger commented 11 years ago

Many of the issues have been dealt with by a3b04c1 and the work towards consistency in the codebase. I've opened #9784 about get vs. unwrap and I think we can tackle any other issues in independent bugs.

rust-lang / rust

RFC: Boxes vs Containers - Naming schemes for element access #7887

Unify boxes and containers

Separate boxes and containers, consistent box naming scheme

Separate boxes and containers, convenient box naming scheme