Refine Collection Interfaces

gracelang / language

Design of the Grace language and its libraries

GNU General Public License v2.0

6 stars 1 forks source link

Refine Collection Interfaces #104

Closed apblack closed 4 years ago

apblack commented 8 years ago

I've had about three go-arounds working on the collection interfaces, but I think the they would benefit from some collective discussion (excuse pun). There seem to be too many Interfaces for novices, and there are some weirdnesses.

Iterables are things that you can get (internal and external) iterators on. They may not be ordered, e.g. sets are iterable as well as lineups. As a consequence, you can ask for aSet.first as well as aLineup.first. (The method first is implemented in the iterable trait as self.iterator.next plus a bound check.) Question: is it weird that first works on sets and is non-deterministic?
Enumberables are distinguished from Iterables in that their order is determined. Is this splitting hairs?
size is required of Iterables, but can raise an exception; sizeIfUnknown(_) can be used to get the size and to provide a default action if it is not known. The alternative would be to distinguish in the type between Boundedinterables and UnboundedIterables,which seems like too many types.
An alternative might be to have InOrderIterator as distinct from, and in addition to, the plain Iterator.
Should we offer a Smalltalk-style SortedColleciton, i.e., one where the order is determined by a comparison function, not by order of insertion.
Fixed sets? Quite useful and efficient.
There should be a method for removing all of the contents of a list, set or dictionary. But what to call it? makeEmpty seems very mechanistic, but empty could be confused with isEmpty. removeAll with no arguments would work, after the compiler is able to distinguish it from removeAll(_), which takes an iterable and removes them all.

milosonator commented 8 years ago

My two cents:

Yes and no. Sets don't have a first because their members are unordered. There is no order so there can be no element that is first.

But one can iterate through a set. And when one does so, there is certainly a first element in the iteration.

Sets should ideally only support set-like operators, like contains and methods for combining sets. But this could make them less useful.

In a program, a set that one can't iterate over is pretty much useless. In math one can say "forall a such that a is in the set ...", but that's not realistic in a programming language.

Could be useful, but very confusing for novices. Enumerable and Iterable sounds like it is the same thing.

Is there some way that it can be avoided that an Iterable has an unknown size? Sounds like the null problem all over again.

I don't see how, while being reasonably efficient. Infinite collectiosn are OK; they can just ahve size infinity. The tricky case is filtered collections, where you can't know the size without generating all of the elements. Suppose that you are filtering an input stream! For many purposes, though, all that's needed is an upper bound on the size; we could add thAt method, I suppose.

Could be annoying for things that expect any Iterator.

.

.

Java uses the term clear so that's what I'm used to. Clear and concise (no pun intended)

Good suggestion; thanks

kjx commented 8 years ago

collection interfaces, but I think the they would benefit from some collective discussion (excuse pun). There seem to be too many Interfaces for novices, and there are some weirdnesses.

Yep. This would be worth doing. I've thought for a while we should at least look at the latest Java design (post-streams) and see what they do: I know they do something about knowing the size, for example.

removeAll with no arguments would work, after the compiler is able to distinguish it from removeAll(),_

well yeah, but we said we didn't want to overload like that, even thought we could.

KimBruce commented 8 years ago

Where can I find the current interface for collections? The current implementation on http://web.cecs.pdx.edu/~grace/minigrace/exp/ does not seem to correspond to the file collectionsPrelude.grace. E.g., the only constructor of a list I've been able to get working is list([1,2]) and emptyList which of course is different from the expected list.withAll(...) or list.empty.

As noted in issue 165 in minigrace, programs also do not recognize methods on lineups.

Can anyone help?

KimBruce commented 8 years ago

OK, I just found the new constructors for the collections library in StandardPrelude (though I still don't understand why those in collectionsPrelude don't work). The only place I'm finding any description of lineup is in gracelib.js (and of course it is still not working in my programs).

I'm trying to update the documentation for these libraries (and provide feedback).

KimBruce commented 8 years ago

I don't much care for the "withAll" factory methods taking an Iterable as a parameter rather than a Collection. This is purely a matter of naming because of pedagogical issues. Explaining Iterable as a name is much harder than Collection. Causing more problems is that fact that a line-up is (apparently) an Iterable, but not a Collection (it is missing the asList, asSequence, and asSet methods). Thus I can't just lie to students and say it takes type Collection.

Could we rename Iterable as Collection and then change the old Collection to ConvertibleCollection or something else?

Actually, why do we need asList, asSet, and asSequence? Each of those collections has a constructor that takes an Iterable and constructs an appropriate collection.

I.e., rather than writing s.asList, write list(s). Then replace the name Iterable by Collection and eliminate the old Collection type.

apblack commented 8 years ago

The specification of the collections interfaces is in standard.md in this repository. That's the document that you should be updating, @KimBruce. @kjx converted your old "Basic types" document form LaTeX.

foo.asList, foo.asSet, etc. may be a bad idea. The same effect can be achieved by writing list (foo), set (foo). The advantage of the parameterless methods is that one can write a chain, for example

foo.filter { ... }.asList.sort.asSequence

which is much clearer than

sequence ( list ( foo.filter { ... }).sort )

when the order of the operations is hopelessly confused. F# has a chaining operator which lets one write pipelines without the parens and makes this nice again, but since method names aren't objects in Grace, I don't see how to steal that idea.

The downside of .asSet, .asList, etc is that it makes set and list special. When set was an object, we could implement .as(set) instead, but now set is a method, we can't.

Should we go back to making them objects? I want to know this soon, because I want to clean up some of the extra levels of indirection in collectionPrelude.

The reason that minigrace's standard library is split between collectionPrelude and StandardPrelude is purely historical: Tim asked me to do that, and I didn't know enough at the time to argue. These two modules are mutually dependent, since StandardPrelude has to expose what's in collectionPrelude, but collectionPrelude, for example, defines types with &, which is defined in StandardPrelude. So the whole thing works only because of some horrible special-casing in minigrace; mutually dependent modules are not otherwise allowed. But the organization of minigrace's library modules is not something that we should be discussing here --- it's a minigrace issue.

apblack commented 8 years ago

Having used list, set, etc as methods for a while, I don’t like it as much as making them objects.

One arrangement that gives us the best of both worlds is to introduce a binary operator, say >>, on collections, which “pipes” the collection on its left into the constructor object on its right. So the normal form for defining constants would be

[1, 2, 3, 4] >> list
[“a”, “b”, “c”] >> set

etc. This would replace the current asList, asSet, etc methods, which are bad because they especially privilege the “built-in” kinds of collections. The implementation of >> in lineups would, naturally, just be

method >> (cons:CollectionFactory) {
    cons.withAll (self)
}

But now each collection can implement >> as it sees fit, and the right-hand-side of the >> operator can be any factory of the client programmer’s choice.

It also removes an unnecessary choice. If we do what Kim has asked for, and meter Iterable and Collection, then either neither of them has asList, which would eliminate the ability to write pipelines in the natural order, or they both do, in which case the programmer has to choose between [1, 2, 3].asList and list [1, 2, 3] which do the same thing.

James will probably point out that we can take this further and make things like map, filter and sort objects rather than methods, and write things like

[2, 5, 9, 6, 7, 2, 4, 4] >> set >> sort >> filter {x -> x.isEven} >> map { x -> x/2 }       // returns a collection [1, 2, 3]

which I believe needs no parenthesis.

F# uses the “pipelining” operator, written |>, to do more or less what I’m suggesting for >>. (They use >> for function composition, where f1 >> f2 applies f1 and then f2). I don’t know if this should influence our choice of operator symbol.

kjx commented 8 years ago

[1, 2, 3, 4] >> list
[“a”, “b”, “c”] >> set

This at least has the advantage of clarifying the distinction between list(1,2,3) and list [1, 2, 3] although I'm still not at all keen on lineups (which really privilege one particular collection). I always figured lineups' interface would grow to be Collection - as Kim put it earlier

Causing more problems is that fact that a line-up is (apparently) an Iterable, but not a Collection (it is missing the asList, asSequence, and asSet methods).

Well we knew that would happen. Hasn't amg got a pragma to stuff more methods into lineups?

James will probably point out that we can take this further and make things like map, filter and sort objects rather than methods, and write things like

[2, 5, 9, 6, 7, 2, 4, 4] >> set >> sort >> filter {x -> x.isEven} >> map { x -> x/2 } // returns a collection [1, 2, 3]

No he won't and he hates that idea. Far too monadic (why does >> feel like >>=)? Plus it really confuses objects and methods: we'd have to make separate sort and filter and whatever objects that reify the messages somehow. How much does that actually give us over:

set (2, 5, 9, 6, 7, 2, 4, 4).sort.filter {x -> x.isEven}.map { x -> x/2 }

F# uses the “pipelining” operator, !>

If we're going to do this, then let's do it properly, with some kind of pipeline and/or proper reified requests rather than making proxy objects that stand for method names but will pollute the namespace.

Ambient talk does this - https://soft.vub.ac.be/amop/at/tutorial/multiparadigm#first-class_messages

Newspeak spec talks about :| (but is unimplemented)

label: ”foo” :| color: Color red :| font: #Courier

which otherwise might have to be written as

((label: ”foo”) color: Color red) font: #Courier

I think we need to simplify what we have, rather than further complicating it, if we can. I think someone should look hard at the Java8 design before we do much more, perhaps Scala too. If we want objects for collections, I quite like this (slightly evil, inspired by Self) design that crosses its fingers and hopes for escape analysis and redundant object optimisations:

set // creates an empty set set.empty // creates an empty set, and then creates another empty set set.withAll(c) // create an empty set and then creates a new set with the elements of collection c set(1,2,3) // creates a set with elements 1, 2, 3

Basically empty and withAll would be like Self's copyRemoveAll and copyContaining but with shorter names, treating the empty set created by set as a prototype, so unifying the Collection and CollectionFactory interface. But that may be a step too far for Andrew (or indeed anyone). We could even make an as method like Andrew's >> above

method as(cons) {cons.withAll(self)}

to write the pipe line as:

seq(2, 5, 9, 6, 7, 2, 4, 4).as(set).sort.filter {x -> x.isEven}.map { x -> x/2 } or

[2, 5, 9, 6, 7, 2, 4, 4].as(set).sort.filter {x -> x.isEven}.map { x -> x/2 }

if needs must

kjx commented 8 years ago

Going back to "pseudo-classes" but keeping them in an imported module:

import "grace/collections" as col

col.set // pseudo class that represents sets
col.set.empty 
col.set.with (1,2,3) 
col.set.with [1,2 ,3]

for pipelines we also support:

[1, 2, 3].into(col.set)

apblack commented 5 years ago

Following our discussion last week, I moved the type parameters from list, set, etc. to empty, with, etc. Consequently, we now have

type CollectionFactory = interface {
    empty⟦T⟧ -> Collection⟦T⟧
        // an empty collection
    with⟦T⟧ (element:T) -> Collection⟦T⟧
        // a collection containing a single element
    withAll⟦T⟧ (elements:Collection⟦T⟧) -> Collection⟦T⟧
        // a collection containing elements
    <<⟦T⟧ (source:Collection⟦T⟧) -> Collection⟦T⟧
}

rather than the old

type CollectionFactory⟦T⟧ = interface {
    empty -> Collection⟦T⟧
        // an empty collection
    with(element:T) -> Collection⟦T⟧
        // a collection containing a single element
    withAll(elements:Collection⟦T⟧) -> Collection⟦T⟧
        // a collection containing elements
    << (source:Collection⟦T⟧) -> Collection⟦T⟧
}

This motivation for this, IIRC, was to make list, set, etc singleton objects rather than methods. This would mean that someone could write a module called twoThreeTree and give it the CollectionFactory interface.

One consequence of this is that when we pipe into a factory

def words = ["one", "two", "three", "four"] >> list

for example, if we want to specify types, we now have to put the type arguments on the >> operation.

def words = ["one", "two", "three", "four"] >>⟦String⟧ list

whereas we used to put them on the object:

def words = ["one", "two", "three", "four"] >> list⟦String⟧

I don't like this very much, but I guess it's OK.

Where it doesn't seem to work at all is illustrated in minigrace/newModules/pipeTools.grace.

def sort = object {
    method <<⟦T⟧ (source:Collection⟦T⟧) {
        // returns a sorted version of source
        list.withAll(source).sort
    }
}

class sortBy⟦T⟧(ordering:Function2⟦T, T, Number⟧) {
    method << (source:Collection⟦T⟧) {
        // returns a sorted version of source
        list.withAll(source).sortBy(ordering)
    }
}

The first object, sort, looks OK. One could use it like this:

words >>⟦String⟧ sort

where words is a sequence; the result is a sorted list. (>>(other) is defined on all collections as requesting other << self.) It seems ugly to have to provide the type argument on the operator, but the real issue is one of consistency. When piping into an existing collection, >> does not accept type parameter, because the types of both the receiver and argument are already determined.

More problems comes with the second class above: sortBy. Suppose that we want to sort in some other order, and thus supply an ordering function to sortBy. sortBy has to take type argument, so that the type of the block ordering can be checked. This means that sortBy has to be a class, not an object, so that it can have a type parameter. Worse, it also means that << can't have a type parameter in this case, because we have already parameterized the class, and we have to insist that the class and the operator are on the same type.

My inclination is to abandon the idea of moving the type parameters to the collection creation classes, and move them back to CollectionFactory. But I don't now recall if there was another reason to move them, other than to make list, set, etc., singletons

apblack commented 5 years ago

A possible alternative is to put type parameters on neither >>(_) nor on list, while leaving the ⟦T⟧ on <<⟦T⟧(_). From where then would the argument come? From the receiver (an instantiated collection), where the method definition would look like:

    method  >>(sink) {
        sink <<⟦ElementType⟧ (self)
    }

where ElementType is the type parameter of the collection that receives the >>(_) message.

kjx commented 5 years ago

I don't have a good enough grasp of the current collections design to really understand this - although I maintain fantasies of eventually porting most of it to Moth or Kernan...

how does this "alternative" work with lineups? where do they get the type parameter from? what would clients write:

def words = ["one", "two", "three", "four"] >> list.empty⟦String⟧

(hmm, I find it odd for list.empty to be mutable... how about)

def words = ["one", "two", "three", "four"] >> list.new⟦String⟧

which is why I'd prefer to write

list[String]("one", "two", "three", "four")

in the first place (although it takes some getting used to reading plain list as making a new list. Not abbreviating and writing col.list[String]("one, two") seems better somehow.

apblack commented 5 years ago

Good question about ~~lineups~~ SequenceConstructors. We could change the language to allow type arguments on SequenceConstructors,

[1, 2, 3]⟦Number⟧

or to do some type inference. Or, we could leave the language as it is, in which case [1, 2, 3] is a Sequence⟦Unknown⟧ If you want some other flavour of Sequence, you can of course construct it explicitly

sequence <<⟦Number⟧ [1, 2, 3]

[1, 2, 3] >>⟦Number⟧ sequence

if we put the type parameters on the operations, and

sequence⟦Number⟧ << [1, 2, 3]

[1, 2, 3] >> sequence⟦Number⟧

if we put the type parameters on the objects.

So, I think that the answer to your question is that if we put the type parameters on the operations, and a programmer wants to type a sequence explicitly, they would have to write

[1, 2, 3] >> sequence.empty⟦Number⟧

or, equivalently,

sequence.empty⟦Number⟧ ++ [1, 2, 3]

kjx commented 5 years ago

type inference - that way lie dragons:

def x = [a, 2, 3]
def y = [x, [], []]

We said we didn't want to make the operational semantics depend on types. I remain unconvinced that the other constructions are better than:

seq[[Number]](1, 2, 3)

apblack commented 5 years ago

I never said that I knew how to do type inference! But you are right, of course: we can't do type inference in just this one place; if we do it at all, we have to do it everywhere.

When you say that you don't see any alternative better than

seq⟦Number⟧(1, 2, 3)

are you saying

that you want the type parameters on the collection factory objects rather than on the collection construction operations; or
that you want to do away withSequenceConstructors and use special multi-arity methods for the collection constructors; or
that you want to replace the square-bracket syntax for SequenceConstructors with seq(_,…,_), which would mean that we would build a set using something like seq(1, 2, 4, 8) >> set or set(seq(1, 2, 4, 8))
some combination of the above; or
something else?

As a reminder: we tried multi-arity methods like sequence(_,_,...,_) . They are implemented in my beginningStudent dialect, and I tried teaching with them for a quarter. The problems with multi-arity methods are as follows.

Defining them requires a compiler hack (but that's true of if(_)then(_)elseif(_)...else(_) too).
They do not give us a good way of defining an empty collection — we used emptySequence, etc., but then we have two independent and unrelated requests for each kind of collection.
Because of (a), library-defined collection types have to invent a new syntax.