Open pmetras opened 2 years ago
If this is discussed at a sync that I'm not present at before I have a chance to comment more fully, I would like it noted that I am not in favor of this change.
I want to note this note as I probably won't have time to put together a full write up within the next few days.
Apart from the naming suggestion I gave above, I am in favor of this change :+1:
A bit of context for this RFC.
I'm presently learning Pony, reading all documentation and particularly the one of the stdlib. Looking at the Array
class, I was surprised that the API was unveiling array iterators implementations, while traditional API design promotes encapsulation. Looking at other classes in the stdlib, I saw that there were other classes that make public their internal implementation while others did not.
$ find . -type f -name '*.pony' -exec grep "fun values" {} \; -print
fun values(): Iterator[this->ByteSeq box]
./packages/builtin/std_stream.pony
fun values(): (Iterator[this->A] & Rewindable[this->A]) =>
./packages/builtin/array.pony
fun values(): Iterator[this->A]^
./packages/builtin/read_seq.pony
fun values(): StringBytes^ =>
./packages/builtin/string.pony
fun values(): Iterator[this->A]^
./packages/builtin/seq.pony
fun values(): ListValues[A, this->ListNode[A]]^ =>
./packages/collections/list.pony
fun values(): MapValues[K, V, H, this->HashMap[K, V, H]]^ =>
./packages/collections/map.pony
fun values(): Iterator[val->A]^ =>
fun values(): Iterator[val->A]^ =>
./packages/collections/persistent/list.pony
fun values(): Iterator[A]^ =>
./packages/collections/persistent/set.pony
fun values(): SetValues[A, H, this->HashSet[A, H]]^ =>
./packages/collections/set.pony
fun values(): Iterator[this->A]^ =>
./packages/collections/heap.pony
fun values(): Iterator[this->String]^ => strings.values()
./packages/cli/command_spec.pony
I couldn't find the logic but the fact that some results concern interfaces and some concrete classes. When designing classes, exposing internals is often considered bad design. But it could be related to some Pony constraints that I don't know about yet. So I decided to ask the question on Zulip regarding the class Array
. Sean T Allen listed his arguments but I did not understood them as they were neither in favour or against such API design decision. And he decided that he wouldn't answer my questions on the subject but if I open an RFC.
A few days passed and I was still thinking about that API decision, and because I think that there must be some rational in any decision, I thought that it could be because of a technical constraint imposed by the language. So I decided to test it by myself to see if there was something in Pony that prevented it. I did the changes in the Array
class in the compiler and ran all the tests, and they passed without problem. So that was probably about something else I don't understand...
Here is the link to my branch of Pony compiler that implement the changes: https://github.com/pmetras/ponyc/tree/rfc-array
Following Sean T Allen's recommendation that the discussion should continue with a RFC, that's what I did.
One of Sean's argument is that naming things is important: "I personally find naming the thing that implements the interface to be a good thing. I have a thing I can refer to. A name that can be reused."
Thinking about this argument, from a type or object perspectives is not important, in my opinion, when client code call values
or keys
functions. The client only needs to know that she is getting an opaque object of an unknown type to her (that's the private class), that respect the Iterator*
return type of the functions...
I thought that this could be different in the situation of using composition, instead of inheritance, to extend this opaque object. But first, that's the Array
data structure that a client generally wants to extend and not ArrayValues
, and secondly composition extends through redefinition of functions in the new types. So hidden classes still remain hidden. And like I've shown in the RFC example of code using ArrayValues
, client have no reasons to use these types.
All my attempts to find a rational failed. I must say I don't understand Sean's argument. I agree that naming thing for public types are important, but I don't see why you need to name things that clients of your class don't need to know (private implementation classes). That's against information hiding/encapsulation. I don't think that Pony syntax or stdlib promotes not enforcing this design principle.
And I forgot to say that Array
is only one situation where this occurs. If one would want to simplify the stdlib API, this change could be applied in other places like List
, Map
, Set
, etc.
I don't have a strong opinion on this RFC, but if Array
already returned an Iterator
interface, I'd probably vote against an RFC that wanted to change them to return concrete types as it does now.
My only concern about this RFC is adding the overhead of virtual calls for these iterators. I assume iterating over arrays is very common, so these specific classes are used a lot, even if users don't name them.
@ergl - I had the same concern about virtual calls, though I think in practice most iterator-using use cases are direct enough (e.g. passing straight into a for
loop) that LLVM "should" be able to inline enough of the code to devirtualize the calls by seeing the concrete type descriptor being written in the same context where the otherwise-virtual calls are being made.
This is a hypothesis we should probably test prior to accepting the RFC. I can do some testing of that if it looks like the RFC has enough support to be accepted.
@SeanTAllen mentioned in the sync call that he'd like to see one of two things happen with this RFC:
builtin
and not set a precedent that should apply to other packages in the standard libraryHe wants to hone in on one of those two directions so that he can better focus his energy in stating his objections, either to the general case (:one:) or to the builtin
-specific reasoning (:two:)
I think this a reasonable request, and agree.
My initial concern was only to get an answer to the question in the title of this RFP that I could understand. It could have been "It's that way because I decided that it is", without any later discussion... But I was intrigued by the fact that the design decision is against the Sanctus encapsulation/information hiding principle of computer science. And I decided to open this RFC to get a definite answer.
Now, if I had to select between direction 1️⃣ or 2️⃣, I think that applying a design principle only on part of a library would be a bad decision by creating an exception (the less exceptions, the better the language). The remaining paths are either correct everything ( 2️⃣) or do nothing (cancel and close the RFC).
Personally, I think that 2️⃣ is the best move, but I don't know yet what could be the impact on existing code or other consequences of such change. But I'm ready to investigate the impacts and update the text of the RFC in that direction if others think it's worth my effort.
@pmetras - I'm confused by your use of numbering - can you please check the numbers in the above comment I gave to make sure they match the use of your numbers - I think you meant :one: when you said :two: - can you confirm?
We discussed this on today's sync call, and Sean suggested you go ahead and try to update the RFC to make your arguments in favor of the approach for either :one: or :two: - that's not a guarantee the RFC will be accepted but it's to ensure he (and others) fully understand the points being argued so that he (and others) can voice any relevant objections clearly.
I'm confused by your use of numbering - can you please check the numbers in the above comment I gave to make sure they match the use of your numbers - I think you meant when you said - can you confirm?
Effectively, I mixed the numbers. I meant [one], that is applying the design change to all stdlib.
This is very hard to read in GitHub due to lack of line breaks. Can you add hard line breaks around column 80 or so to make it easier to read in the GitHub web UI?
I think this RFC doesn't consider the cost of breaking user code enough. Even if I was to be convinced of the value of this change, I think we need to give more weight to considering that this will break people's code and a good justification beyond "design principles" needs to be given for breaking code.
Regardless of how I feel about this specific RFC, I would make the same point for any RFC. I would like to see the breaking change taking more seriously and consideration given to justifying that breaking change.
RFCs set precedent for future changes and the breaking change in here feels somewhat cavalierly presented.
@pmetras thank you for the linebreaks. its much easier to read now.
@SeanTAllen asked me to weigh in on this RFC to talk a bit about my reasoning, since he knows I support this RFC.
I believe that minimizing API surface is an important component in an overall strategy of stabilizing an API surface.
To be clear about my terms, and to further explain:
When I talk about "stabilizing an API surface", I mean the process of carefully designing an API surface in such a way that the need for future breaking changes is prevented to the greatest extent that is reasonable. That design process includes being clear and explicit about which API guarantees you are making to the API consumers.
It also includes carefully considering which API guarantees are actually helpful to the API consumers, so that guarantees which are not helpful to API consumers can be excluded. And that's what I mean when I talk about "minimizing an API surface". An ideal stable API will minimize the set of guarantees it makes to include only the most helpful set (above some particular threshold of "helpfulness"), because every guarantee made by a stable API is a fetter that may weigh you down during the process of introducing future improvements.
To quote Antoine de Saint-Exupéry:
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
And that's the simplest reason why I am in favor of this RFC. Currently the API makes a guarantee that the return value of each of these functions is a specific concrete class with a specific public name. That particular guarantee does not actually serve any practical needs of users (except, arguably, the virtual calls caveat that was mentioned above). If it is not helpful to users in practice, it can be removed. And if it can be removed, then it should be removed, in the name of taking another step toward stabilizing the Pony standard library's API.
That's my basic argument.
It's worth mentioning that if this change was being debated in reverse (the current standard library had minimal interfaces for the return values, and somebody proposed to make them public concrete classes instead, because they found some tangible benefit in doing so), then it would not be a breaking change at all, because the new proposed type would be a subtype of the existing return types.
It'll always be easier to get more specific with these return types than more general. So we should default to offering the most general return type that is still useful to users, and ratchet down to more specific subtypes as needed (as tangible use cases for more specific return types are discovered). Ratcheting down will always be possible without a breaking change, but ratcheting up will be a breaking change each time. It's best to make such "ratchet up" breaking changes like this RFC before we reach Pony 1.0.0 as part of a general effort to make Pony and its standard library stable.
@jemc and I discussed this at length on Friday and came up with a version of this RFC that we can both agree upon. One of us will be adding information about what agreed upon as that we could both support.
Things Joe and I discussed and agreed upon (@jemc please add any additional points I missed adding):
We discussed why we consider this a change to ONLY collections iterators and why it wouldn't set a precedent outside of the case we considered for collections. The reason for this is that looking at the existing collections and collections we forsee coming, we believe that the iterators for all of the collections should always remain in lockstep for the interface they provide. For example, rewind
doesn't exist on all the current iterators for collections but really, we think it should. A shared interface being returned as with this RFC would assure they stay in sync as intended.
Further, we believe that all collection iterators should have `rewind as a method as part of this RFC.
We also believe from the above that the collections iterator shouldn't be called "RewindableIterator" or use "rewind" in the name as we would add new collection iterator methods to said interface in the future and then it would have a poor name. That is, the interface in question is an interface that describes a collections iterator.
Based on how structural typing works in Pony, we do not need more general interfaces like rewind
or RewindableIterator
that exist in the standard library as users can add them on their own and in fact should. If a user only needs the next() and has_next() methods from an interface that includes next(), has_next(), and rewind(), they should create their own interface that is limited to those two methods IF they want to be protected from any breaking change that might come from a change in the standard library.
We doubt that this change will break much user code, however, we don't really have much evidence either for or against that position.
We doubt that this change will require a lot of work for someone to update to fix.
We are most worried about breakage in terms this breaking 3rd party libraries that might still work prior to this change but are otherwise not being maintained. Given our dearth of 3rd party libraries, we consider this an important point, but again, not particularly likely.
We don't think that this change will result a large immediate win for user.
We do think that this will make the standard library easier to maintain and will make it easier to present users with a consistent interface to collections iterators.
Given all of the above points, we are in favor of this RFC if it was to be rewritten to take all of the above points and justifications into account thereby codifying for future decision making, why this RFC was accepted and what impact if any it has on precedence for making future decisions.
The only thought I've had so far for the name of the iterator that collections iterators would conform to is CollectionIterator
.
I have not yet thought of a better name than that.
Sorry for the delay in responding, I'm quite busy these days. I'll try to complete the open points before next week-end.
I've seen that you want to limit the changes to the collections
module, and that's the place where the Iterator
interface is used. And the code changes are effectively limited to that module.
But I'm questioning about the use of interfaces in Pony when compared with other languages (where I'm more accustomed to call them traits). For me, a Pony interface is used to present a type view (I don't have a better term) of an object to a function that expects that this objects implements some functions. The receiving function can be assured, from that parameter object that implements the type view, that it'll be able to call the functions defined by the interface. That's a contract between the receiving function and the parameter. The receiving function is not concerned by other type views (other Pony interfaces) that the parameter would implement, nor is it concerned if the return types from the interface functions are subtypes of the types as defined in the interface. These are minimal requirements for the receiving function when using interfaces.
So I was surprised when I saw that Pony developers wanted to make the return types explicit, as they were subtypes of the minimum types, as this was particularly flagrant with the Array
iterators functions. For this reason, I want to audit the whole stdlib source and check if this development pattern is used elsewhere with other interfaces. This is not really part of this RFC and won't be included, but for trying to understand what could justify it. Why would one need to unveil implementation classes, as it it still a question without explanations in my mind?
A way to solve the CollectionIterator
vs RewindableIterator
naming issue is to reconsider what I suggested in a previous discussion with @jemc but that we put aside: using functional traits composition and create interface Rewindable
. But at that time I did an error in the way I wrote it, as it was dependant on Iterator
.
Instead, write:
interface Iterator[A]
"""Something we can iterate on"""
fun ref has_next(): Bool
fun ref next(): A ?
interface Rewindable[A]
"""Something that can be rewound"""
fun ref rewind(): A
Now you can create rewindable iterators for Array
and non-rewindable ones for other collections with something like the following untested code:
type RewindableIterator[A] is (Iterator[A] & Rewindable[Iterator[A]])
class _ArrayValues[A: A, B: Array[A] #read] is RewindableIterator[B->A] ref
That way, one can really build functional interfaces/traits from composition of basic interfaces/traits.
We dont want to introduce a number of not needed interfaces into builtin @pmetras. We want a single interface for collectins iterators because we expect them to change in lockstep.
Yes, but you can't have a single interface to describe two different functional behaviours: rewindable vs non-rewindable iterators.
The way I explained is the way to limit the number of specialized interfaces. Imagine that later there's a new kind of iterator where you can ask for the number of remaining elements in the iteration. The way to solve it is to create a RemainingCountable
interface an use composition to match with the different types, not creating all the variations RemainingCountableIterator
, RemainingCountableRewindableIterator
, RewindableIterator
, etc.
We want 1 interface for all collections because we want them to move in lockstep. All collections iterators should be rewindable.
When discussed during the sync call on Tuesday, consensus amongst attendees was that agreement with the points and suggestions put forth in my summation comment of the "Sean and Joe believe comment".
Attendees were @sgebbie, @kulibali, @redvers, @ergl, @jemc, and @SeanTAllen. I don't believe all attendees voiced an opinion but those who did supported the plan as laid out in the aforementioned comment.
Audio of the call for context is available here.
Sorry, I've not much time to work on it in the coming days. I'm not sure I'll be able to complete it. Though here are some thoughts I had while browsing the stdlib source. I've numbered them for easier reference, but the sort order is not important.
First, here is how I understand the change in lockstep thing as the given explanations are quite terse. In a collection
data structure, you want that all iterator functions that exist (usually keys
, values
or pairs
but some collections have more of them or less) return a CollectionIterator
and not an Iterator
. If a new function is added later in CollectionIterator
, for instance infinite
that returns if the iterator is bound or infinite, the goal is for all the collections to benefit from this addition.
Like I wrote, changing the contract of an interface by adding new function will break client code (i.e. all clients using CollectionIterator
). So usually this is something that is not done nor desired by stdlib authors. I don't understand what justifies this change in lockstep.
What would be the rule for a developer to decide using Iterator
vs CollectionIterator
? Because she will see signatures for values
, for instance, that are sometimes CollectionIterator
(in collections
or builtin
) and other times Iterator
(elsewhere in stdlib). It'll be more complex both for the API stdlib user but also the stdlib author to decide which type to use without a simple rule.
Though Range
and Reverse
are part of collections
, are they Iterator
or CollectionIterator
? If they are CollectionIterator
, their rewind
function signature must be changed. What will be the impact on existing client code?
Array
is not part of collections
, but does it need to be considered as a collection? It means that builtin
will also knows about CollectionIterator
, and probably Pony compiler...
Do we need to consider the builtin interfaces Seq
and ReadSeq
as Iterator
or CollectionIterator
? If you consider that a sequence is a collection of items, like Array
, how do you consider StringBytes
or StringRunes
because one would want to be able to rewind iteration on these types?
And what about ByteSeqIter
as used in streams, where we can't rewind when the stream has been flushed?
Many iterators in collection are built with the following destructive pattern:
create
: Create a data structure, frequently Array
where all the collection items are put.has_next
: Test if the data structure is empty.next
: Remove an element from the data structure and return itTo include rewind
in these iterators, as far as I've understood from a previous comment from Sean, you don't want to rebuild the data structure but use existing content, meaning that the content will have to stay in the data structure. It requires to rewrite the whole iterator code. What is the impact on data structure elements capabilities? Do we need to declare them tag
or val
?
Even if some of you have a definite opinion on the best solution to implement this RFC, I'm still thinking on the various options, balancing impact on existing and future code.
Like I wrote, changing the contract of an interface by adding new function will break client code (i.e. all clients using CollectionIterator). So usually this is something that is not done nor desired by stdlib authors. I don't understand what justifies this change in lockstep.
Adding a method to an interface will only break implementers and the intention is that the only implementers are in the standard library. Any user code that accepts a CollectionIterator
will not break.
Array is not part of collections, but does it need to be considered as a collection? It means that builtin will also knows about CollectionIterator, and probably Pony compiler...
For the sake of this conversation when Joe and I say "all collections", that includes Array
. A full enumeration of exactly what uses CollectionIterator and what doesn't should be included in the RFC. Anything that is unclear should be in unresolved to highlight that it would need to be addressed.
I might not have stated explicitly and I apologize for that CollectionIterator
would be part of the builtin package.
Many iterators in collection are built with the following destructive pattern:
create: Create a data structure, frequently Array where all the collection items are put. has_next: Test if the data structure is empty. next: Remove an element from the data structure and return it
Can you provide an example of this pattern? I didn't see it upon a quick review.
Do we need to consider the builtin interfaces Seq and ReadSeq as Iterator or CollectionIterator?
Seq and ReadSeq are interesting and limited cases as they only provide access to values()
as it is the bare minimum interface. I would imagine they to be Iterator
. They are interfaces not concrete collections like we would be applying CollectionIterator
to. ReadSeq
and Seq
are used outside of the standard library and the intent of CollectionIterator
is that it is "standard library only"
Though Range and Reverse are part of collections, are they Iterator or CollectionIterator? If they are CollectionIterator, their rewind function signature must be changed. What will be the impact on existing client code?
I believe those are CollectionIterator but that would require more thought.
The only change I see to the signature would be "should this return a this
like ArrayValues does. Given that can do method chaining with ".>", it seems like rewind
on ArrayValues is using the pre-method chaining approach to returning this
which wouldn't be needed. I think the key here is to decide on the appropriate signature for rewind and implementing accordingly.
If some existing rewinds changed to have a return type, that isn't going to break code, but if we remove the return value (say from ArrayValues) then it would be a breaking change.
Looking at ArrayValues, that looks like we missed changing the return value when we introduced method chaining. I personally would be in favor of changing the ArrayValues return type and accepting the breakage from that given that we already have accepted breakage to make this change.
- Adding a method to an interface will only break implementers and the intention is that the only implementers are in the standard library. Any user code that accepts a CollectionIterator will not break.
Any existing client code that presently accept a *Key
or *Value
or *Pairs
iterator will have to be recompiled to accept an CollectionIterator
instead, but this can be done usually without code change.
But if I implemented a BitSet
data structure or other type of collection, that is not part of the stdlib, with a values
function, I'll have to adapt my code if I want to offer a collection-compatible interface to clients, meaning that I'll have to implement the new rewind
function.
- Can you provide an example of this pattern? I didn't see it upon a quick review.
Here are examples I've found tonight doing a search on shift
:
No one explained the advantages or disadvantages of creating a new type CollectionIterator
instead of updating the existing interface Iterator
. If CollectionIterator
were private to the collection
package, I could understand that its definition evolves like the classes it operates on and that changes to that type of iterator would have limited impact (I think that's what Sean T. Allen calls lockstep), but it does not seem to be what's desired. I don't see a major advantage in creating a new type of iterator for collections only, when both iterators are builtin. Iterator
could be updated to define rewind
with a default implementation, and neither client's nor implementer's code won't be impacted at all...
Please explain why adding CollectionIterator
is a better solution than updating Iterator
so these arguments can be added to the RFC.
Also, the iterator contract does not specify if a collection can be updated while iterated. This is a frequent source of bugs in other languages, even sequential ones. I haven't thought how to deal with this situation with Pony and if capabilities can help there. But suppose that we allow some collections to be updated by other actors that we had a new function updated
to notify an iterator when the iterated collection has been changed (new elements added, removed or updated) and let the iterator implementer decide what to do. In which iterator interface would you add this updated
function?
If it were added by combining different types, for instance like (Iterator[A] & Updatable[A])
, implementors would have the flexibility to define precise types, compared with the suggested CollectionIterator
interface.
Here, Updatable
is something like
interface Updatable[A]
fun updated(): A
"""Notifies that the type `A` has been updated and returns the new content."""
Please explain why adding CollectionIterator
is a better solution than combining types so these arguments can be added to the RFC.
@pmetras This part:
I don't see a major advantage in creating a new type of iterator for collections only, when both iterators are builtin.
Iterator
could be updated to definerewind
with a default implementation, and neither client's nor implementer's code won't be impacted at all...
would not work for iterators that define a streaming sequence. You could, potentially, define an iterator over an infinite stream of events that can't possibly be rewinded, unless you kept all previous elements in memory. We don't want to constraint future uses of the Iterator
interface so that all implementations have to be rewindable.
@ergl , perhaps I did not express correctly my concern. The important word is for collection only. I have previously suggested naming it RewindableIterator
or better, in my opinion, having a Rewindable
interface.
If Iterator
is defined as an interface with a default implementation of rewind
that raises an error or does nothing or create a new iterator (behaviour to be defined later), this type could be used even by streams or infinite sequenses that can't be rewound. For instance:
interface Iterator[A}
fun has_next(): Bool
fun next(): A ?
fun rewind() ? => error
"""By default, iterators can't be reset"""
I don't think that such interface is the best solution, but at least it prevents having two types with one that is difficult to limit to collections usage.
@pmetras This RFC should not modify Iterator[A]. No one is suggesting anything that would result in a change to Iterator
.
I've added the rendered RFC at https://pmetras.github.io/rfcs/ and updated the text.
Now, I've just to find some time to see how to implement it and measure the impact.
@pmetras I can only speak for myself, but I am not going to accept an RFC which breaks from our RFC procedure. There is no need to add HTML commits as GitHub will render the RFC's Markdown directly. You can link to that rendering like I did in #190
@rhagenson , please explain how to have it rendered and how to get a rendered URL. I wasn't able to find it automatically rendered as I wanted to be able to read it and I would have save a few hours today. So I used pandoc and followed Github documentation...
@pmetras Thank you! Will you update your first comment with the link to the GitHub rendered Markdown?
I've repeated numerous times that I am against setting precedent based on a design principle. I didn't read the entire new version of this RFC. I read up to the 3rd mention of this being done because of a design principle and stopped.
I'm against this RFC and at this point, I'm not going to invest more time in it. I can't support the motivation given, do not support it being used to set precedent going forward and have stated that more than once. I have many other Pony related things to do and so I am voicing my opposition to this RFC. I would welcome an RFC that meets the reasoning stated here: https://github.com/ponylang/rfcs/pull/193#issuecomment-1021509909.
As is, it seems clear to me this RFC is not going to be written taking that motivation into account.
I won't update this text anymore. It seems my English vocabulary and wording are not adequate to describe the objective of the RFC. Anybody who wants to support this proposal can take over and update this text.
@pmetras Thank you for the work you have put into this RFC.
I am in support of the core of the RFC so I am still hoping to see it through to the end. However, I am already working on another RFC (#200) which I expect will draw most of my attention. Is there anybody else in the Pony community with the bandwidth needed to take over this RFC?
Proposal to make
Array
iterator classesArrayKeys
,ArrayValues
andArrayPairs
private. InsteadArray
functionkeys
,values
andpairs
will return generalIterator
interface like other stdlib classes that implement theses functions. Hiding the implementation of these iterators follow good design principles.A new interface
Rewindable
has been added for when an iterator can be rewinded.These changes have been implemented in my repository of the Pony compiler and all tests pass.
EDIT: Rendered RFC: https://github.com/pmetras/rfcs/blob/rfc-array/text/0000-array-private-classes.md