gracelang / language

Design of the Grace language and its libraries
GNU General Public License v2.0
6 stars 1 forks source link

Collections again #185

Open apblack opened 4 years ago

apblack commented 4 years ago

I just closed #104 because it was old, and the particular issues discussed there have been resolved. I've recently run up against some new ones.

Recently, I was wanting to extract an arbitrary element from a set, and could not see how. So I added anyone to sets. Then, later, I found this specification in type Collection⟦T⟧

first -> T
    // The first element of self; raises BoundsError if there is none.  
    // If self is unordered, then first returns an arbitrary element. 

I had forgotten about this. This question I want to raise is: which is preferable?

  1. Having first on unordered collections return an arbitrary element, and on collections with numeric keys be equivalent to at 1
  2. Having anyone work on any collection, and return an arbitrary element, and have first work only on collections with numeric keys, where is is always equivalent to at 1

I also noticed that find(predicate) ... and includes(predicate), which look for an element that satisfies predicate and either return it (find) or tell us whether or not the search was successful (includes), existed on sets but not other collections. This seems wrong; there is no reason not to implement them on all collections, and indeed the same code would work for all. But I have trouble remembering the distinction between contains and includes (perhaps because Smalltalk uses includes for what Grace calls contains). So I'm suggesting

anySatisfies(predicate: Function1⟦T,Boolean⟧) -> Boolean
    // true if predicate holds for any of the elements of self

allSatisfy(predicate: Function1⟦T,Boolean⟧) -> Boolean
    // true if predicate holds for all of the elements of self

where anySatisfies is the new name for includes, and allSatisfy is a new method.

The tension here is between putting more stuff into collections, and thus making the interface better for sophisticated programmers, but harder for novices to wrap their minds around, or leaving stuff out, and thus providing opportunities for student assignments that implement methods like anySatisfies and includes.

We can have the best of both worlds, by creating a beginning dialect that exposes simpleCollections, for example. But then someone has to maintain them.

kjx commented 4 years ago

leaving stuff out, and this providing opportunities for student assignments that implement methods like anySatisfies and includes.

In a world in which Grace is open source, popular, widely used, then the source code is available anyway, and answers to every assignment question will be on stackoverflow

harder for novices to wrap their minds around

to me this is the most important reason not to add things in.

Although, again, perhaps it matters less than we think, if documentation can present the core methods first, without requiring novices to learn the entire lot at once. If we had autocompletion (static types? statistics?) then we could always offer the core methods first. Still: we have dialects, be a shame not to use them.

KimBruce commented 4 years ago

Is there any reason why we don't just use iterators for all this? Each collection class has an iterator and there is a filter method as well. Having first for an unordered collection seems wrong (even though it is easily implemented from the iterator)!

anySatisfies is just filter with a check for empty. allSatisfies can also easily be implemented with iterators.

One thing we should worry about is libraries that are too complicated for novices. I found in teaching Java-based data structures that it was nearly impossible to avoid talking about wild card types (though I hated them!). It should be easy to write a simple, easy-to-maintain dialect that restricts what can be invoked from collections and hence solves that problem.

apblack commented 4 years ago

As for first not being appropriate for unordered collections, I agree: it seems wrong.

And yes, all of these things can be implemented with iterations. Indeed, they are so implemented. What it comes down to is: should we have just one collection module, or two, and if two, should they be “collections” and “beginningCollections”, or “advancedCollections” and “collections”. And which should be part of “standard”?

kjx commented 4 years ago

Is 'any' or 'random' an option rather than 'first'?
The real problem with 'first' isn't 'first' - it's that it implies there should be a 'second'.

kjx commented 4 years ago

There's a difference I think between implementing things with iterators (C++ STL perhaps), vs making iterator-like things a first-class part of the library design (e.g. Java Streams). The random example here would give a stream that sampled a collection presumably without replacement.

kjx commented 4 years ago

I'm not sure it matters whether we have one or two or even three collection modules. Either or both can be part of the standard.

Given an Advanced Collections library; it should just be a Small Matter of Programming to use it to implement a much simpler beginner's library - wrappers are probably easiest.

Given a basic collections library, it will be relatively easy to implement new kinds of collections; implementing more behaviour / algorithms into every existing collection will be tricker unless that's done as external methods, although with somewhat sneaky modularisation or inheritance / traits even internal extensions shouldn't be too hard. (Either the collections inherit from parallel empty traits in a "collections-extension-hook" module that can be replaced by programmers, or an extended collections module can import the individual leaf collections and extend each one. Supporting something like family polymorphism gets more of this for free, but will most likely break the manifest rule - although again, it may be possible with careful design of replaceable modules).

What's most important is that we actually have a collections library (yay Andrew);
that it's clearly described; ideally portable;
and that the interface to the underlying VM is also clearly described, portable, and ideally minimalist. Kernan has an interface to a primitive mutable collection (think Java array) via a "platform/memory" module which is a factory for the primitive collection objects that understand only size, at(_), at(_)put(_) - user level collections are implemented in terms of those primitive objects...

(this is another reason why I don't much like the [1,2,3] notation because that's where the circularity comes in --- unless again [1,2,3] is always re-written to something like literalHook.newSequence(1,2,3) (doesn't have to be that verbose) where literalHook is presumably a small module with factories for sequences (and probably ranges too) but which can be replaced where necessary so that [1,2,3] notation can work with any particular library. Or something.

apblack commented 3 years ago

Nice observation, @kjx, that if we once have a complete collections bundle, then creating a restricted bundle is a simple application of excludes. This is much better than two independent source pools, or extension methods.