rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.97k stars 12.69k forks source link

Implement collections reform #18424

Closed alexcrichton closed 9 years ago

alexcrichton commented 9 years ago

Tracking issue for https://github.com/rust-lang/rfcs/pull/235, there are a number of sub-issues associated with this:

Backwards incompatible changes to make:

Backwards compatible changes that will require additional language features:

Backwards compatible changes to make:

alexcrichton commented 9 years ago

My list may not be 100% accurate, so feel free to edit @aturon!

Gankra commented 9 years ago

@aturon how do you want to coordinate this work?

gamazeps commented 9 years ago

I ca give some help for the (not to hard) bugs :)
@Gankro still working on your TODO, but it's a bit more complex than expected at first reading :p

Gankra commented 9 years ago

I assume it should read "implement deref on strings and vecs"?

alexcrichton commented 9 years ago

Ooops, indeed!

alexcrichton commented 9 years ago

I'll start off by removing the collections traits. @Gankro and @gamazeps, do you guys have something you'd like to start on specifically? I figure we can start out by just knocking out all the bullet points above. I can add everyone's name next to what you're working on to make sure we don't conflict. Does that sound ok?

aturon commented 9 years ago

Thanks for jumping on this @alexcrichton. I'd like to do the Borrow-related stuff, Cow and IntoIterator.

Gankra commented 9 years ago

@alexcrichton we had agreed (perhaps only informally) to take this opportunity to reorganize the modules. In particular several files define two structures, where one is a convenience wrapper for the other (most of the Sets). I want to split those up like BTree and HashMap do, to make maintenance a bit more friendly. I had hoped to do this first since it will clobber all the history, but that ship has sailed, and there's a few outstanding PRs that would need a nasty rebase. Should we delay that till later, or try to get that in ASAP?

Regardless, I can take on conventions-hell.

alexcrichton commented 9 years ago

Hm, that sounds like a good idea! Why don't we wait for me to remove the collections traits and you to land some conventions stuff, and then we can reorganize, does that sound ok?

I'll assign you to conventions-related tasks

gamazeps commented 9 years ago

@alexcrichton I'm kind of a begginer so I'm not sure I'm able to really evaluate the difficulty of the different parts, if you could assign me to the an easy one I would be very happy :p (I have nothing that I would specifically want to do)

Gankra commented 9 years ago

@gamazeps Ensuring ____ is implemented should be easy (if it's not already done).

Gankra commented 9 years ago

Renaming Extendable should also be pretty easy if you're good at search-and-replace.

Gankra commented 9 years ago

Adding repeat should also be pretty quick.

alexcrichton commented 9 years ago

@gamazeps feel like tackling Extendable => Extend and making sure everyone implements it to start out?

gamazeps commented 9 years ago

Awesome :)

gamazeps commented 9 years ago

@alexcrichton Done :) (compiling for now) Except for Slice where I don't see where I would implement it :/

Gankra commented 9 years ago

Slices can't be extended, so that's fine.

gamazeps commented 9 years ago

Cool, I'll implement FromIterator to the objects who need it then :)

aturon commented 9 years ago

Just wanted to say, I'm ecstatic to see all this activity already! You folks are awesome.

Gankra commented 9 years ago

We seem to be stepping on eachothers toes a lot here, I think we briefly need a bit more coordination while the most eggregious breaks occur.

@alexcrichton @gamazeps @aturon When all your currently outstanding PRs are good to go, can Alex do a high-priority rollup to merge it all in at once? Then can we pause all of this work while I do the layout refactoring? After that, we should be able to work more independently.

gamazeps commented 9 years ago

Sounds good to me. I should be able to finish (with @alexcrichton, remarks) my PR tonight (utc).

alexcrichton commented 9 years ago

@Gankro seems ok to me.

Gankra commented 9 years ago

Hmm... now that I'm implementing this stuff, it's become clear to me that the RFC is clear on methods that collections should be defined, but a bit ambiguous as to what should actually be removed. I know in our design of the RFC we wanted to axe some of the DList methods, and the RFC explicitly calls them out, but nothing gets concretely stated.

@aturon did you write any final decisions on that sort of thing? Perhaps you should handle a separate "deprecate misc" pass?

Gankra commented 9 years ago

Tackling conventions now.

First Pass Errata:

Gankra commented 9 years ago

Errata Continued:

aturon commented 9 years ago

Hmm... now that I'm implementing this stuff, it's become clear to me that the RFC is clear on methods that collections should be defined, but a bit ambiguous as to what should actually be removed. I know in our design of the RFC we wanted to axe some of the DList methods, and the RFC explicitly calls them out, but nothing gets concretely stated.

@aturon did you write any final decisions on that sort of thing? Perhaps you should handle a separate "deprecate misc" pass?

You're right, the RFC is not completely exhaustive on this stuff. That's intentional: RFCs are generally needed for global conventions issues (like method names common across many collections), but API stabilization can tackle individual choices for specific modules.

I'm happy to divide up the deprecation work in whatever way make life easiest for you. If there are clearcut cases based on our earlier spreadsheet/discussions, feel free to deprecate in your PR and I'll review. Regardless, we'll be making a quick stabilization pass over anything that's left as #[experimental] when the dust settles, and I can deprecate at that point as well.

aturon commented 9 years ago

@Gankro

  • SmallIntMap is an implicit growth structure
  • Bitv is an explicit growth structure. It actually just straight up panics if you get/set oob. We should reflect upon if these are the semantics we want. As always, Bitv tests my sanity.

I'll make a PR against the RFC for some of the factual errors (like growth categorization). Keep the corrections coming!

Ideally, we would introduce overloads of things like += and then add these for all of the sets.

It's fine for collections to implement only the pieces of the iterator trio that make sense for them; the important thing is that the methods, when available, have consistent naming and semantics.

What do you mean by "unspecified"? Remember that the RFC focuses only on major conventions issues, so naming/semantics for APIs specific to a particular data structure are not laid out (and are part of API stabilization).

Yes, though this hasn't been consistently implemented yet. The basic convention is: functions that work with an index should panic on out of bounds errors, with the sole exception of get/get_mut (the non-panicking versions of index and index_mut).

aturon commented 9 years ago

@Gankro

  • front/back/front_mut/back_mut are left unspecificed for deques. Presumably we want these, still. May also want some version of these for Vec.

Yes, those should be left for deques.

And Vec (well, now, slices) has these methods as the (poorly named!) head and last functions. We're planing to finalize the names for Vec/slices during another API stabilization pass, so don't worry about them for now.

  • Are reserve and reserve_exact allocating additional space with respect to the current capacity, or the current len? My gut feeling is that current len is almost always desired.

Ooh, good question. I share your instinct here, though it means that:

my_vec.reserve(1);
my_vec.reserve(1);

is not equivalent to my_vec.reserve(2). In particular, the second line would have no effect.

It seems to be a question of "reserve space for X more items" or "reserve X more than previously reserved".

In any case, any option is potentially confusing, but I think you're right that basing on the current len is almost always the semantics you actually want.

Gankra commented 9 years ago

Ah ok, I thought we were trying to nail down most of the methods. If we're only interested in "common" methods, that's fine.

Gankra commented 9 years ago

LruCache has a weird API. It's mostly just a map, but it will drop elements as it pleases. API-wise the only weird thing is put, which is basically insert, but it doesn't yield any information about old values. If it did, then it would still be a bit odd because there are two possible things it could return. It could return the old value for the key, or it could return the element that was dropped when capacity was exceeded. The latter is probably completely uninteresting, though. I'm not clear that the former is necessarily interesting, either.

We could just rename it to insert but with no return value.

Gankra commented 9 years ago

Note: need to rework capacity semantics of VecMap and BitvSet. They seem to be logically distinct capacity classes.

aturon commented 9 years ago

@Gankro re: LruCache, I see that somehow it was completely left out of the method breakdown! I'd expect its API to match e.g. HashMap, keeping in mind that eventually they should both satisfy a Map trait.

michaelsproul commented 9 years ago

I'll have a go at adding ByNeed and Predicate.

aturon commented 9 years ago

@michaelsproul We might want to hold off on that, because unfortunately the technique won't work when we transition to unboxed closures until we have some additional language features. I suspect that portion of the RFC won't be able to land until after 1.0, sadly.

michaelsproul commented 9 years ago

@aturon: Ah, ok. Never mind then.

Gankra commented 9 years ago

:scream:

gamazeps commented 9 years ago

Is there still some stuff to do ?

Gankra commented 9 years ago

@gamazeps: yep! I'm tackling the low-hanging fruit on conventions stuff right now, but a lot of collections are just straight-up missing functionality. I'm going to fill up the collections with "implement this" FIXMEs so people can tackle this individually. Adding stuff isn't a back-compat hazard, so that should be fine!

As for things you can tackle:

TreeMap, BTreeMap, and TrieMap should all implement Bounded Iterators as described here. If you're interested, you could work on that.

I specifically wrote BTreeMap with this design in mind, so the iterators should Just Work if you initialize the two deques to the correct search paths. So that one shouldn't be too bad to get your feet wet with (the logic for making those deques won't be trivial, though). Note: you'll need to change size_hint to be non-exact.

BitV also needs to have its APIs refactored to match closer to Vec's semantics, although the details of this aren't concrete.

gamazeps commented 9 years ago

@Gankro Awesome, I'll start with that (Bounded Iterators) :)

Gankra commented 9 years ago

18605 gets most of the actual conventions junk out of the way. As part of it, I added FIXME's for non-trivial work that needs to be done:

snip see below

Gankra commented 9 years ago

If/When this PR lands, I'd like to release the conventions tasks back to the pool. School work has ramped back up, and these can be worked on more piecemeal now. We also need to discuss the details.

Note that I didn't really tackle String conventions at all.

aturon commented 9 years ago

String has already gone through API stabilization, so we should be OK there. (Note that str slices still need work, but we'll be doing that shortly.)

Gankra commented 9 years ago

Updated FIXME's

C:\g\rust [collect-fruit]> git grep 'FIXME(conventions)'

src/libcollections/binary_heap.rs:// FIXME(conventions): implement into_iter
src/libcollections/bit.rs:// FIXME(conventions): look, we just need to refactor this whole thing. Inside and out.
src/libcollections/btree/map.rs:// FIXME(conventions): implement bounded iterators
src/libcollections/btree/set.rs:// FIXME(conventions): implement bounded iterators
src/libcollections/btree/set.rs:// FIXME(conventions): implement BitOr, BitAnd, BitXor, and Sub
src/libcollections/enum_set.rs:// FIXME(conventions): implement BitXor
src/libcollections/enum_set.rs:// FIXME(conventions): implement len
src/libcollections/ring_buf.rs:// FIXME(conventions): implement shrink_to_fit. Awkward with the current design, but it should
src/libcollections/ring_buf.rs:// FIXME(conventions): implement into_iter
src/libcollections/str.rs:// FIXME(conventions): ensure bit/char conventions are followed by str's API
src/libcollections/tree/map.rs:// FIXME(conventions): implement bounded iterators
src/libcollections/tree/map.rs:// FIXME(conventions): replace rev_iter(_mut) by making iter(_mut) DoubleEnded
src/libcollections/tree/set.rs:// FIXME(conventions): implement bounded iterators
src/libcollections/tree/set.rs:// FIXME(conventions): implement BitOr, BitAnd, BitXor, and Sub
src/libcollections/tree/set.rs:// FIXME(conventions): replace rev_iter(_mut) by making iter(_mut) DoubleEnded
src/libcollections/trie/map.rs:// FIXME(conventions): implement bounded iterators
src/libcollections/trie/map.rs:// FIXME(conventions): implement into_iter
src/libcollections/trie/map.rs:// FIXME(conventions): replace each_reverse by making iter DoubleEnded
src/libcollections/trie/set.rs:// FIXME(conventions): implement bounded iterators
src/libcollections/trie/set.rs:// FIXME(conventions): implement union family of fns
src/libcollections/trie/set.rs:// FIXME(conventions): implement BitOr, BitAnd, BitXor, and Sub
src/libcollections/trie/set.rs:// FIXME(conventions): replace each_reverse by making iter DoubleEnded
src/libcollections/trie/set.rs:// FIXME(conventions): implement iter_mut and into_iter
src/libcollections/vec_map.rs:// FIXME(conventions): capacity management???
src/libstd/collections/hash/map.rs:// FIXME(conventions): update capacity management to match other collections (no auto-shrink)
src/libstd/collections/hash/map.rs:// FIXME(conventions): axe find_copy/get_copy in favour of Option.cloned (also implement that)
src/libstd/collections/hash/set.rs:// FIXME(conventions): implement BitOr, BitAnd, BitXor, and Sub
src/libstd/collections/hash/set.rs:// FIXME(conventions): update capacity management to match other collections (no auto-shrink)
src/libstd/collections/lru_cache.rs:// FIXME(conventions): implement iterators?
src/libstd/collections/lru_cache.rs:// FIXME(conventions): implement indexing?
Gankra commented 9 years ago

Okay, the biggest refactors are all done. Everyone should be free to work without clobbering each-other now!

aturon commented 9 years ago

Thanks @Gankro! I'm glad to help coordinate ongoing efforts here, please continue using this issue to track who's doing what.

swgillespie commented 9 years ago

Awesome! Is there anyone currently working on the TreeMap/TreeSet TODOs that @Gankro posted? If not I'd like to give it a shot.

gamazeps commented 9 years ago

@swgillespie I'm currently working on the BoundedIetrators, I guess I should be done with them saturday afternoon, and while I'm at it I'll try to tackle the others. But I'm happy to coordinate with you (I'm gamazeps on irc) :)

jbcrail commented 9 years ago

I took a shot at implementing the missing len() method for EnumSet. See #18740.

pczarn commented 9 years ago

Ensure capacity-related methods follow conventions

I'm taking a stab at removing implicit shrinking from hash_map. The only doubtful part of the public API might be the behavior of fn reserve_exact.

Gankra commented 9 years ago

Poking at find_copy/get_copy, ran into a compiler bug. Gonna block that work on the fix.