Open TysonAndre opened 4 years ago
Hi @TysonAndre :wave:
My long-term intention has been to not merge this extension into php-src. I would like to see it become available as a default extension at the distribution level. Unfortunately I have no influence or understanding of that process. Having an independent release and development cycle is a good thing, in my opinion.
If those plans change, I would like to hold off until a 2.0 release - I've learnt a lot over the last 4 years and would like to revisit some of the design decisions I made then, such as a significant reduction of the interfaces or perhaps more interfaces with greater specificity. Functions like map
, filter
, reduce
can be delegated to other libraries that operate on iterable
instead of having these as first-class members of the interface. There is a 2.0 branch with some ideas but I haven't looked at that in a while.
I have been working on a research project to design persistent data structures for immutability, so there is a lot of work that I have set for myself for this project over the next 6 months or so. I have no intention to push for distribution changes in the short-term but I am open to the suggestion.
Thanks for the response
My long-term intention has been to not merge this extension into php-src. I would like to see it become available as a default extension at the distribution level. Unfortunately I have no influence or understanding of that process. Having an independent release and development cycle is a good thing, in my opinion.
Do you mean OS distribution level (Windows, Ubuntu, CentOS, HomeBrew for mac, etc.?)
I agree on the benefits of flexibility and being able to port improvements back to older php versions but adoption rates would be lower
Functions like map, filter, reduce can be delegated to other libraries that operate on iterable instead of having these as first-class members of the interface.
So for vector<T>::filter()
vs map<K,T>::filter
would the plan be to have those libraries check for vector, map, etc, or just any Collection? https://www.php.net/manual/en/ds-vector.filter.php
Do you mean OS distribution level (Windows, Ubuntu, CentOS, HomeBrew for mac, etc.?)
He meant distribution with PHP core (on all platforms where PHP is available).
So for vector
::filter() vs map<K,T>::filter would the plan be to have those libraries check for vector, map, etc, or just any Collection
Most likely he means libraries such as https://github.com/nikic/iter.
Do you mean OS distribution level (Windows, Ubuntu, CentOS, HomeBrew for mac, etc.?)
He meant distribution with PHP core (on all platforms where PHP is available)
Whichever is more viable - simply not merged into core, but distributed and enabled by default alongside it.
Most likely he means libraries such as https://github.com/nikic/iter.
Correct. You could create a new instance of anything that can be constructed from an iterable, so something like Vector::from(filter($vector, $callback))
or Set::from(keys($map))
could be lazy, evaluating the iterator only as required. This is by no means a real example, my thoughts are only that operating on iterators primarily has great benefits.
Unfortunately, performance takes a hit because internal iteration is much, much faster (no need to pass through the Iterator methods). I hope to support both lazy collections and internal iteration, but I haven't drafted any plans for that yet.
I'm going to close this discussion because I would rather start from scratch with a much smaller footprint and a focus on persistence.
I just discovered this thread from a message by @TysonAndre on internals today: https://externals.io/message/112639
@rtheunissen, I would kindly suggest to think twice about 2 of your decisions above:
My long-term intention has been to not merge this extension into php-src.
I understand some of the benefits of such a decision; however, as @TysonAndre said, adoption will be much slower, and people typically will not use data structures in their library or open-source app, because portability will suffer a lot.
On the contrary, if data structures come into core, there will be a strong announcement effect, and people will surely start moving some of their arrays to specific data structures. Libraries will start using them internally, because they'll know that they can rely on it being always available.
I do understand that you want to improve your first implementation (which is already excellent, by my standards!), but building on your past "mistakes", can't you build v2 with integration into core in mind? Even if that's not in the next minor release, but having a long-term goal of stabilization of the API and integration into core would be great.
Functions like map, filter, reduce can be delegated to other libraries that operate on iterable instead of having these as first-class members of the interface.
Again, I understand the rationale behind this decision, like reducing duplication and keeping only the core functionality in DS. However, sometimes you have to take into consideration ease of use vs purity of the code.
Ease of use / DX / readability: it seems more logical to me to do:
$map->filter(fn(...) => ...);
Rather than:
Some\filter($map, fn(...) => ...);
Speed: as you said, internal iteration is faster. And speed is one of the selling points of DS vs arrays.
Static analysis: I love the fact that Map::filter()
can be strictly typed as returning Map<TKey, TValue>
in Psalm, for example. If you rely on a generic filter()
function, I'm not sure such strict typing will be easy or even possible.
Thank you for your work on DS anyway, I already use the extension in my closed-source project, in particular Map
. I would love to use data structures in my open-source projects, one day! :crossed_fingers:
I totally agree with @BenMorel and I believe this issue should be revisited since a new RFC came up: https://wiki.php.net/rfc/vector
There's a specific session on it that mentions this extension. Maybe if @rtheunissen reached out to @TysonAndre they could work on this matter together so PHP would have data structures in the core (which I think is really important) and the new improvements that would come in a v2 are implemented.
Here is the discussion thread: https://externals.io/message/116048
I totally agree with @BenMorel and I believe this issue should be revisited since a new RFC came up:
@BenMorel puts the reasons why I think php should have better quality datastructures always-on in core pretty well. (always on rather than in an external extension, even if some of the numerous projects that package php and distribute it also enable some version of php-ds (docker images, Linux distributions, third party repos for Linux distributions offering alternate binary builds, automation tools such as chef/puppet, homebrew, Windows PHP team, etc.,)))
NOTE: I'm currently working on https://wiki.php.net/rfc/deque instead after the RFC discussion on Vector
I mentioned that the vector rfc was on hold on a different mailing list thread on Deque
, and updated the RFC after that comment
There's a specific session on it that mentions this extension. Maybe if @rtheunissen reached out to TysonAndre they could work on this matter together so PHP would have data structures in the core (which I think is really important) and the new improvements that would come in a v2 are implemented.
I still had a few more minor changes planned for the Deque
RFC to various documentation pages and implementation (etc.) before reaching out again in this issue's discussion pages. However, since it's been brought up, I'll ask again since this is the first time general-use collection datastructures (other than WeakMap, to some extent) had an RFC created for PHP, since php 5.3:
From the response to my original question - https://github.com/php-ds/ext-ds/issues/156#issue-691549685 - I've been assuming that they'd be opposed to using any of their work for a competing RFC, especially if future proposals would be significantly different from their plans for v2 (e.g. Vector::filter, map, etc). So I felt like reimplementing data structures independently in my spare time using only alternative sources (such as php-src's SplFixedArray, ArrayObject, SplObjectStorage, array
, and my other experience in php/PECL internals and C/C++ development) was the only way I was comfortable with to respect the significant amount of work the php-ds developers have put into this over the years (especially if what php internals members wanted and had gotten approved through the RFC process (e.g. functions Vector::map
, hypothetically) had significantly differed from the philosophy design goals of php-ds.)
We would all agree php's core (always-on) functionality is significantly lacking in efficient datastructures and omits other useful types of datastructures, but we disagree on the best way to distribute efficient datastructures to end users. I find it strange and frustrating for work in php's core itself on general-use datastructures to have stalled for a long time with no-one (in php's core team) working on addressing this in a way that ensures portability and ease of use.
This excellent PECL and the work done on it by volunteer maintainers and contributors has helped improve and optimize (or improve the readability of) the code of many php end users and application/library, but I personally passionately feel a mature, widely used programming language should have implementations of many of these collections always available within its own standard library (especially when native C implementations are much more efficient than userland PHP), rather than have users rely on installing an optional external PECL with a separate release cycle and a small number of volunteer maintainers with limited time. So I've started to work on that in my own free time. (I also believe in general that PECL authors should be free to choose their distribution plans without feeling pressured to work on something if they're opposed to it (or if they didn't have the sometimes significant amounts of time and energy to spend on the RFC discussion, RFC document, or the implementation)) (As @BenMorel said, adoption would be much slower due to portability concerns or potential lack of long-term backwards compatibility guarantees, etc., in major version releases)
In the event that @rtheunissen had changed their mind (on merging php-ds into core) and had plans of their own to create a competing RFC, I would be willing to postpone the Deque
RFC for a few months or provide feedback for an RFC if they'd requested it. (I'm sure many others would be willing to help with a php-ds RFC, implementation, etc. if the project lead was planning on it.)
(Right now, it'll take at least 3 weeks to start a vote on the Deque
RFC due to needing time to create a straw poll on wiki.php.net and gather feedback on this proposal's namespace/name choice, since there have been changes to namespacing policy since 5.3 and no precedents set yet for collections)
Hi everyone, I am happy to see this discussion and I thank you all for taking part. My reservation to merge ds into core has always been because I wanted to make sure we get it right before we do that and the intention behind the mythical v2 was to achieve that, based on learnings from v1 and feedback from the community. I have no personal attachment to this project, I only want what is best for PHP and the community.
I would love to see a dedicated, super-lean vec data structure in core that has native iteration and all the other same internal benefits as arrays. In my opinion, the API should be very minimal and potentially compatible with all the non-assoc array functions. An OO interface can easily be designed around that. I'm imagining something similar to Golang's slices.
As for the future of ds itself, I think these can co-exist and ds can remain external. I've been researching and designing immutable data structures over the last 4 years and I still hope to develop a v2 that simplifies the interfaces and introduces immutable structures. Attempting to implement a suite of structures in core or an OO vector would take a lot of work and might be difficult to reach consensus on with the API. I don't think we should attempt to merge ds into core at any time.
I am currently traveling and have not followed this discussion in detail on the mailing list. I'd be happy to assist in any way I can and will catch up as soon as I am home again this week. Feel free to quote this response on the mailing list as well.
Hi everyone, I am happy to see this discussion and I thank you all for taking part. My reservation to merge ds into core has always been because I wanted to make sure we get it right before we do that and the intention behind the mythical v2 was to achieve that, based on learnings from v1 and feedback from the community. I have no personal attachment to this project, I only want what is best for PHP and the community.
That's great, many in the PHP community and internals will be happy to hear that, and may speed up the timeline for the inclusion of some datastructures.
I would love to see a dedicated, super-lean vec data structure in core that has native iteration and all the other same internal benefits as arrays. In my opinion, the API should be very minimal and potentially compatible with all the non-assoc array functions. An OO interface can easily be designed around that. I'm imagining something similar to Golang's slices.
I would also want a native vec
type with gettype($vecValue) === 'vec'
(especially for type safety and runtime error prevention),
but there are several concerns
https://github.com/php/php-src/pull/7491
"Use more compact representation for packed arrays."
was recently proposed for PHP 8.2, and if it (hopefully) works out,
it makes the argument of reduced memory less compelling to voters on RFCs.
Unlike FaceBook's HHVM, PHP does not have a built-in type checker to detect misuses of vec
.
There are several external type checkers for PHP, including Psalm, Phan, and PHPStan, but I expect many projects do not set up external type checkers, and an type checker built into PHP seems ambitious and far away. E.g. the need to follow PHP's release cycle would also limit improvements made to the type checker in a patch version, and there is a steep learning curve for contributors to type checkers. (I am a lead of Phan)
Unlike FaceBook's HHVM, PHP isn't managed by a single organization, so ambitious changes like this
that may indirectly lead to needing to upgrade (or migrate away from outdated libraries) may be more controversial.
(e.g. asort
, references, etc.)
Slices have some parts (that make them hard to pick up for beginning programmers) that make me hesitant to propose slices myself, such as users not expecting slice modifications to modify other slices of different ranges, or the original array.
It took me a long time to fully get used to the implications of Golang's append()
semantics
and instinctively tell when they were misused.
As for the future of ds itself, I think these can co-exist and ds can remain external. I've been researching and designing immutable data structures over the last 4 years and I still hope to develop a v2 that simplifies the interfaces and introduces immutable structures. Attempting to implement a suite of structures in core or an OO vector would take a lot of work and might be difficult to reach consensus on with the API. I don't think we should attempt to merge ds into core at any time.
I am currently traveling and have not followed this discussion in detail on the mailing list. I'd be happy to assist in any way I can and will catch up as soon as I am home again this week. Feel free to quote this response on the mailing list as well.
Some questions for when you get back:
What are your thoughts on naming patterns for new collections into PHP.
I feel like the distinct design/implementation choices (especially final
), improved functionality, and potential for confusion makes the Spl
prefix a mistake,
and had planned to put together existing and new responses for pros and cons of other naming suggestions https://externals.io/message/116112
(e.g. \Deque
, \Collections\Deque
(plural Collections
), SplDeque
, etc)
This current lack of precedent for naming is something I currently perceive as making the inclusion of new classes (especially new collections) harder than it was in the past, due to also needing to pass a 2/3 vote. Recently, the RFC https://wiki.php.net/rfc/namespaces_in_bundled_extensions passed and offered some naming guidance.
Are there any groups of functionality/interfaces for php-ds that you feel are finished design and ready for proposal into php-ds.
E.g. PHP's lack of a native Map
and Set
is something I've personally felt is a major omission, and assume would be a good starting point.
If you're planning on assisting with adding improved datastructures to core, which options would you find acceptable? I'd like to make sure we don't have different things in mind (and know which options are unacceptable) before mailing internals. E.g.
Many others on the php internals project have advocated adding php-ds or functionality of php-ds to core, so I expect there should be plenty of people (some with more RFC/internals experience than me) who are willing to help.
@TysonAndre Could you add \Ds\Vector to the benchmark? Considering the extension is mentioned in the RFC I think it would make sense to have it in the comparison as well. I'm quite surprised that arrays are so fast. Also I noticed that in the benchmark for the new Vector class you're using array access instead of methods. I wonder if methods would be faster.
Could you add \Ds\Vector to the benchmark? Considering the extension is mentioned in the RFC I think it would make sense to have it in the comparison as well. I'm quite surprised that arrays are so fast. Also I noticed that in the benchmark for the new Vector class you're using array access instead of methods. I wonder if methods would be faster.
Method calls are significantly slower than the array operator shorthands, unsurprisingly, it's widely known php method/function calls are slow and high overhead (compared to other programming languages) despite optimization efforts and investigations.
arrays are highly optimized in PHP due to the heavy optimization focus from high frequency of use and the fact opcache can make many inferences about them to avoid dead code, unnecessary cleanup, etc.
https://externals.io/message/116048#116077 and the limitations of that benchmark is mentioned in my response to that in https://externals.io/message/116048#116080 for the thread on Vector
vs Ds\Vector
, still, I'm working on the Deque
RFC right now. (notably for Vector
, different starting sizes of 8 vs 10 result in different memory uses and different numbers of resizings. See https://www.php.net/manual/en/class.ds-vector.php#ds-vector.synopsis)
This benchmark is focusing on the improvements that new datastructures in core would have over the options already available in core, for use cases that would benefit from having this functionality in core (e.g. required for portability reasons). Mentioning an option that is not in core in benchmarking would be mixed messaging if people skimmed over the RFC and failed to realize \Ds\Vector
was not part of PHP - it would make sense for advertising a new PECL.
Set
or Map
functionality, it would also be mixed messaging to compare their version of *Set
against Ds\Set
- the memory and cpu characteristics from what the php-ds maintainers proposed for always-on inclusion in core would essentially be the same for cpu and memory usage of *Set
and Ds\Set
, and not make sense for the php-ds maintainers to include in such an RFC. The performance of the best alternatives already in core would make more sense for them as wellphp-ds is mentioned in the RFC because the question of why we should work on this functionality in core comes up frequently in proposals to improve datastructures in php itself and needed to be answered for people asking those questions. As the maintainer of php-ds said, improvements to php's core can coexist with PECL libraries outside of core.
If there is a competing RFC, I will update the RFC with those benchmarks.
My position currently is that I think we should start from scratch and borrow whatever is good from php-ds that we want to keep to implement something natively in core. In the 4 or 5 years since I wrote this extension I've been studying persistent data structures in-depth and there are a lot of decisions that I made then that I would do differently now. I would like to be a part of the design and implementation of the data structures themselves, but I do not have the understanding or capacity to be involved in work relating to the engine or its integration. It seems unlikely that a 2.0 of this extension will come about, I'm not convinced that a complete rework would be a good investment of our time.
Would anyone be interested in a call sometime to discuss some hopes and dreams for all of this?
I didn't find any similar questions after a quick search on the repo's issue tracker, and couldn't be certain from linked discussions from the blog post.
This was suggested on the php-internals mailing list a few years ago, but never made it past the idea phase (e.g. a comment wondering if new classes should be enhanced with a wider method set for functional programming, but that seems to have been largely added for many classes? https://externals.io/message/93301#93347 ).
(e.g. creating an RFC proposing moving DS\Vector to \Vector or \PHP\Vector in php 8.1, or an RFC proposing moving the extension as-is into php-src/ext/ds keeping the
\DS
namespace and moving development to php-src)@author
tags, phpinfo() attribution, input on initial RFCs, etc.