php-ds / ext-ds

An extension providing efficient data structures for PHP 7
https://medium.com/p/9dda7af674cd
MIT License
2.11k stars 95 forks source link

What are the maintainers' opinions on using php-ds as a basis for new data structures in php-src(core)? #156

Open TysonAndre opened 4 years ago

TysonAndre commented 4 years ago

I didn't find any similar questions after a quick search on the repo's issue tracker, and couldn't be certain from linked discussions from the blog post.

This was suggested on the php-internals mailing list a few years ago, but never made it past the idea phase (e.g. a comment wondering if new classes should be enhanced with a wider method set for functional programming, but that seems to have been largely added for many classes? https://externals.io/message/93301#93347 ).

(e.g. creating an RFC proposing moving DS\Vector to \Vector or \PHP\Vector in php 8.1, or an RFC proposing moving the extension as-is into php-src/ext/ds keeping the \DS namespace and moving development to php-src)

  1. Do you have objections to those ideas, different plans, etc?
  2. If you're in favor, what would you prefer for attribution (e.g. commit history for PRs having Co-Authored-By, new files/classes having @author tags, phpinfo() attribution, input on initial RFCs, etc.
rtheunissen commented 4 years ago

Hi @TysonAndre :wave:

My long-term intention has been to not merge this extension into php-src. I would like to see it become available as a default extension at the distribution level. Unfortunately I have no influence or understanding of that process. Having an independent release and development cycle is a good thing, in my opinion.

If those plans change, I would like to hold off until a 2.0 release - I've learnt a lot over the last 4 years and would like to revisit some of the design decisions I made then, such as a significant reduction of the interfaces or perhaps more interfaces with greater specificity. Functions like map, filter, reduce can be delegated to other libraries that operate on iterable instead of having these as first-class members of the interface. There is a 2.0 branch with some ideas but I haven't looked at that in a while.

I have been working on a research project to design persistent data structures for immutability, so there is a lot of work that I have set for myself for this project over the next 6 months or so. I have no intention to push for distribution changes in the short-term but I am open to the suggestion.

TysonAndre commented 4 years ago

Thanks for the response

My long-term intention has been to not merge this extension into php-src. I would like to see it become available as a default extension at the distribution level. Unfortunately I have no influence or understanding of that process. Having an independent release and development cycle is a good thing, in my opinion.

Do you mean OS distribution level (Windows, Ubuntu, CentOS, HomeBrew for mac, etc.?)

I agree on the benefits of flexibility and being able to port improvements back to older php versions but adoption rates would be lower

Functions like map, filter, reduce can be delegated to other libraries that operate on iterable instead of having these as first-class members of the interface.

So for vector<T>::filter() vs map<K,T>::filter would the plan be to have those libraries check for vector, map, etc, or just any Collection? https://www.php.net/manual/en/ds-vector.filter.php

enumag commented 4 years ago

Do you mean OS distribution level (Windows, Ubuntu, CentOS, HomeBrew for mac, etc.?)

He meant distribution with PHP core (on all platforms where PHP is available).

So for vector::filter() vs map<K,T>::filter would the plan be to have those libraries check for vector, map, etc, or just any Collection

Most likely he means libraries such as https://github.com/nikic/iter.

rtheunissen commented 4 years ago

Do you mean OS distribution level (Windows, Ubuntu, CentOS, HomeBrew for mac, etc.?)

He meant distribution with PHP core (on all platforms where PHP is available)

Whichever is more viable - simply not merged into core, but distributed and enabled by default alongside it.

Most likely he means libraries such as https://github.com/nikic/iter.

Correct. You could create a new instance of anything that can be constructed from an iterable, so something like Vector::from(filter($vector, $callback)) or Set::from(keys($map)) could be lazy, evaluating the iterator only as required. This is by no means a real example, my thoughts are only that operating on iterators primarily has great benefits.

Unfortunately, performance takes a hit because internal iteration is much, much faster (no need to pass through the Iterator methods). I hope to support both lazy collections and internal iteration, but I haven't drafted any plans for that yet.

rtheunissen commented 4 years ago

I'm going to close this discussion because I would rather start from scratch with a much smaller footprint and a focus on persistence.

BenMorel commented 3 years ago

I just discovered this thread from a message by @TysonAndre on internals today: https://externals.io/message/112639

@rtheunissen, I would kindly suggest to think twice about 2 of your decisions above:

My long-term intention has been to not merge this extension into php-src.

I understand some of the benefits of such a decision; however, as @TysonAndre said, adoption will be much slower, and people typically will not use data structures in their library or open-source app, because portability will suffer a lot.

On the contrary, if data structures come into core, there will be a strong announcement effect, and people will surely start moving some of their arrays to specific data structures. Libraries will start using them internally, because they'll know that they can rely on it being always available.

I do understand that you want to improve your first implementation (which is already excellent, by my standards!), but building on your past "mistakes", can't you build v2 with integration into core in mind? Even if that's not in the next minor release, but having a long-term goal of stabilization of the API and integration into core would be great.

Functions like map, filter, reduce can be delegated to other libraries that operate on iterable instead of having these as first-class members of the interface.

Again, I understand the rationale behind this decision, like reducing duplication and keeping only the core functionality in DS. However, sometimes you have to take into consideration ease of use vs purity of the code.

  1. Ease of use / DX / readability: it seems more logical to me to do:

    $map->filter(fn(...) => ...);

    Rather than:

    Some\filter($map, fn(...) => ...);
  2. Speed: as you said, internal iteration is faster. And speed is one of the selling points of DS vs arrays.

  3. Static analysis: I love the fact that Map::filter() can be strictly typed as returning Map<TKey, TValue> in Psalm, for example. If you rely on a generic filter() function, I'm not sure such strict typing will be easy or even possible.

Thank you for your work on DS anyway, I already use the extension in my closed-source project, in particular Map. I would love to use data structures in my open-source projects, one day! :crossed_fingers:

CViniciusSDias commented 3 years ago

I totally agree with @BenMorel and I believe this issue should be revisited since a new RFC came up: https://wiki.php.net/rfc/vector

There's a specific session on it that mentions this extension. Maybe if @rtheunissen reached out to @TysonAndre they could work on this matter together so PHP would have data structures in the core (which I think is really important) and the new improvements that would come in a v2 are implemented.

Here is the discussion thread: https://externals.io/message/116048

TysonAndre commented 3 years ago

I totally agree with @BenMorel and I believe this issue should be revisited since a new RFC came up:

@BenMorel puts the reasons why I think php should have better quality datastructures always-on in core pretty well. (always on rather than in an external extension, even if some of the numerous projects that package php and distribute it also enable some version of php-ds (docker images, Linux distributions, third party repos for Linux distributions offering alternate binary builds, automation tools such as chef/puppet, homebrew, Windows PHP team, etc.,)))

NOTE: I'm currently working on https://wiki.php.net/rfc/deque instead after the RFC discussion on Vector

I mentioned that the vector rfc was on hold on a different mailing list thread on Deque, and updated the RFC after that comment

There's a specific session on it that mentions this extension. Maybe if @rtheunissen reached out to TysonAndre they could work on this matter together so PHP would have data structures in the core (which I think is really important) and the new improvements that would come in a v2 are implemented.

I still had a few more minor changes planned for the Deque RFC to various documentation pages and implementation (etc.) before reaching out again in this issue's discussion pages. However, since it's been brought up, I'll ask again since this is the first time general-use collection datastructures (other than WeakMap, to some extent) had an RFC created for PHP, since php 5.3:

  1. Are your plans for distributing php-ds still the same? My impression from reading the earlier responses to the issue and others were that @rtheunissen was unlikely to reconsider their long term plan on distributing the extension separately from php, with a much smaller feature set, and didn't seem likely to reconsider that, especially until v2. E.g. https://github.com/php-ds/ext-ds/issues/92#issuecomment-422443225 , which I missed when asking earlier.
  2. If you are aiming for finishing up v2 before reconsidering plans, I checked issue trackers and the documentation for 2.0 but still had questions on what was remaining: What are the current goals for v2 (Simplicity? Were immutable data structures part of that plan)? Which pages/documents are up to date or out of date (e.g. is there a roadmap/TODO list)?

From the response to my original question - https://github.com/php-ds/ext-ds/issues/156#issue-691549685 - I've been assuming that they'd be opposed to using any of their work for a competing RFC, especially if future proposals would be significantly different from their plans for v2 (e.g. Vector::filter, map, etc). So I felt like reimplementing data structures independently in my spare time using only alternative sources (such as php-src's SplFixedArray, ArrayObject, SplObjectStorage, array, and my other experience in php/PECL internals and C/C++ development) was the only way I was comfortable with to respect the significant amount of work the php-ds developers have put into this over the years (especially if what php internals members wanted and had gotten approved through the RFC process (e.g. functions Vector::map, hypothetically) had significantly differed from the philosophy design goals of php-ds.)


In the event that @rtheunissen had changed their mind (on merging php-ds into core) and had plans of their own to create a competing RFC, I would be willing to postpone the Deque RFC for a few months or provide feedback for an RFC if they'd requested it. (I'm sure many others would be willing to help with a php-ds RFC, implementation, etc. if the project lead was planning on it.)

(Right now, it'll take at least 3 weeks to start a vote on the Deque RFC due to needing time to create a straw poll on wiki.php.net and gather feedback on this proposal's namespace/name choice, since there have been changes to namespacing policy since 5.3 and no precedents set yet for collections)

rtheunissen commented 3 years ago

Hi everyone, I am happy to see this discussion and I thank you all for taking part. My reservation to merge ds into core has always been because I wanted to make sure we get it right before we do that and the intention behind the mythical v2 was to achieve that, based on learnings from v1 and feedback from the community. I have no personal attachment to this project, I only want what is best for PHP and the community.

I would love to see a dedicated, super-lean vec data structure in core that has native iteration and all the other same internal benefits as arrays. In my opinion, the API should be very minimal and potentially compatible with all the non-assoc array functions. An OO interface can easily be designed around that. I'm imagining something similar to Golang's slices.

As for the future of ds itself, I think these can co-exist and ds can remain external. I've been researching and designing immutable data structures over the last 4 years and I still hope to develop a v2 that simplifies the interfaces and introduces immutable structures. Attempting to implement a suite of structures in core or an OO vector would take a lot of work and might be difficult to reach consensus on with the API. I don't think we should attempt to merge ds into core at any time.

I am currently traveling and have not followed this discussion in detail on the mailing list. I'd be happy to assist in any way I can and will catch up as soon as I am home again this week. Feel free to quote this response on the mailing list as well.

TysonAndre commented 3 years ago

Hi everyone, I am happy to see this discussion and I thank you all for taking part. My reservation to merge ds into core has always been because I wanted to make sure we get it right before we do that and the intention behind the mythical v2 was to achieve that, based on learnings from v1 and feedback from the community. I have no personal attachment to this project, I only want what is best for PHP and the community.

That's great, many in the PHP community and internals will be happy to hear that, and may speed up the timeline for the inclusion of some datastructures.

I would love to see a dedicated, super-lean vec data structure in core that has native iteration and all the other same internal benefits as arrays. In my opinion, the API should be very minimal and potentially compatible with all the non-assoc array functions. An OO interface can easily be designed around that. I'm imagining something similar to Golang's slices.

I would also want a native vec type with gettype($vecValue) === 'vec' (especially for type safety and runtime error prevention), but there are several concerns

As for the future of ds itself, I think these can co-exist and ds can remain external. I've been researching and designing immutable data structures over the last 4 years and I still hope to develop a v2 that simplifies the interfaces and introduces immutable structures. Attempting to implement a suite of structures in core or an OO vector would take a lot of work and might be difficult to reach consensus on with the API. I don't think we should attempt to merge ds into core at any time.

I am currently traveling and have not followed this discussion in detail on the mailing list. I'd be happy to assist in any way I can and will catch up as soon as I am home again this week. Feel free to quote this response on the mailing list as well.

Some questions for when you get back:

  1. What are your thoughts on naming patterns for new collections into PHP. I feel like the distinct design/implementation choices (especially final), improved functionality, and potential for confusion makes the Spl prefix a mistake, and had planned to put together existing and new responses for pros and cons of other naming suggestions https://externals.io/message/116112 (e.g. \Deque, \Collections\Deque (plural Collections), SplDeque, etc)

    This current lack of precedent for naming is something I currently perceive as making the inclusion of new classes (especially new collections) harder than it was in the past, due to also needing to pass a 2/3 vote. Recently, the RFC https://wiki.php.net/rfc/namespaces_in_bundled_extensions passed and offered some naming guidance.

  2. Are there any groups of functionality/interfaces for php-ds that you feel are finished design and ready for proposal into php-ds.

    E.g. PHP's lack of a native Map and Set is something I've personally felt is a major omission, and assume would be a good starting point.

  3. If you're planning on assisting with adding improved datastructures to core, which options would you find acceptable? I'd like to make sure we don't have different things in mind (and know which options are unacceptable) before mailing internals. E.g.

    1. Delegating the work of implementing the remainder of an API's functionality and writing and promoting any RFC to add datastructures?
    2. Co-writing/Leading an RFC
    3. Delegating the work of deciding what functionality to add or remove in an RFC to someone else with experience adding RFCs, e.g. past contributors to the php-ds project.
    4. Other models

    Many others on the php internals project have advocated adding php-ds or functionality of php-ds to core, so I expect there should be plenty of people (some with more RFC/internals experience than me) who are willing to help.

enumag commented 3 years ago

@TysonAndre Could you add \Ds\Vector to the benchmark? Considering the extension is mentioned in the RFC I think it would make sense to have it in the comparison as well. I'm quite surprised that arrays are so fast. Also I noticed that in the benchmark for the new Vector class you're using array access instead of methods. I wonder if methods would be faster.

TysonAndre commented 3 years ago

Could you add \Ds\Vector to the benchmark? Considering the extension is mentioned in the RFC I think it would make sense to have it in the comparison as well. I'm quite surprised that arrays are so fast. Also I noticed that in the benchmark for the new Vector class you're using array access instead of methods. I wonder if methods would be faster.

Method calls are significantly slower than the array operator shorthands, unsurprisingly, it's widely known php method/function calls are slow and high overhead (compared to other programming languages) despite optimization efforts and investigations.

arrays are highly optimized in PHP due to the heavy optimization focus from high frequency of use and the fact opcache can make many inferences about them to avoid dead code, unnecessary cleanup, etc.

https://externals.io/message/116048#116077 and the limitations of that benchmark is mentioned in my response to that in https://externals.io/message/116048#116080 for the thread on Vector vs Ds\Vector, still, I'm working on the Deque RFC right now. (notably for Vector, different starting sizes of 8 vs 10 result in different memory uses and different numbers of resizings. See https://www.php.net/manual/en/class.ds-vector.php#ds-vector.synopsis)

This benchmark is focusing on the improvements that new datastructures in core would have over the options already available in core, for use cases that would benefit from having this functionality in core (e.g. required for portability reasons). Mentioning an option that is not in core in benchmarking would be mixed messaging if people skimmed over the RFC and failed to realize \Ds\Vector was not part of PHP - it would make sense for advertising a new PECL.

php-ds is mentioned in the RFC because the question of why we should work on this functionality in core comes up frequently in proposals to improve datastructures in php itself and needed to be answered for people asking those questions. As the maintainer of php-ds said, improvements to php's core can coexist with PECL libraries outside of core.

If there is a competing RFC, I will update the RFC with those benchmarks.

rtheunissen commented 2 years ago

My position currently is that I think we should start from scratch and borrow whatever is good from php-ds that we want to keep to implement something natively in core. In the 4 or 5 years since I wrote this extension I've been studying persistent data structures in-depth and there are a lot of decisions that I made then that I would do differently now. I would like to be a part of the design and implementation of the data structures themselves, but I do not have the understanding or capacity to be involved in work relating to the engine or its integration. It seems unlikely that a 2.0 of this extension will come about, I'm not convinced that a complete rework would be a good investment of our time.

Would anyone be interested in a call sometime to discuss some hopes and dreams for all of this?