php-ds / ext-ds

An extension providing efficient data structures for PHP 7
https://medium.com/p/9dda7af674cd
MIT License
2.11k stars 95 forks source link

Specify capacity at construction of Vector #152

Closed still-dreaming-1 closed 4 years ago

still-dreaming-1 commented 4 years ago

If efficiency is one of the main points of this extension, and the main point of the Vector class is for a value’s index to be a direct mapping to its index in the buffer, wouldn't having to increase the capacity at some point after construction be a potentially expensive operation? I guess if you immediately call allocate() after constructing it, there is nothing that needs to be copied. But still, why waste the initial allocation operation if it just has to be done over again, potentially even from scratch if there is not enough memory next to the initial allocation? Construction time seems like the most useful time to allow the programmer to specify the capacity and allocate a custom amount.

It could be a second parameter passed into the constructor. A PHP programmer could explicitly pass $values as null if they want, or pass something in there, and then a second parameter would be $capacity.

This would be particularly helpful when you already know how large the capacity will end up being, when you don't actually need it to grow or know that part of the code well enough to know it won't grow. It is also possible you know exactly how large it needs to be initially to store all the values you are about to put in it, and it is not super likely to grow after that. In that case, you may want to allocate it to that size, or something a little larger to give it a little room to grow.

rtheunissen commented 4 years ago

I've been drafting an Allocation type that is immutable and typed, to simulate a fixed array. I'll expedite that a bit, I have some time now.

still-dreaming-1 commented 4 years ago

Awesome. I still think it would be helpful for the Vector constructor to allow the programmer to specify a starting capacity. There are also situations where you don't know the end capacity, but you know it is likely to be much higher than 8. I guess I just feel if it is possible to set the capacity in a custom way, this is an incomplete feature without it also being a constructor option.

I'm curious what you mean about the Allocation being being "typed". At work we rely heavily on Psalm as a static analysis tool for the types in our code, and as a way to add a much more sophisticated type system to PHP (at static analysis time anyway). I was thinking of contributing to the polyfill project for this extension by adding Psalm typing to the docblocks. Basically it would give the programmer the option to declare in code comments what types a particular data structure object will contain, and then Psalm can verify those types were honored/used correctly. This would go a long way to prevent people that rely on Psalm from having to either wrap these classes or create a Psalm plugin to get better typing.

rtheunissen commented 4 years ago

That is an excellent idea, I will work on that if you don't beat me to it. By typed I was thinking you pass a type enum in the constructor that defaults to any/mixed, but can be any of the internal types. If the type is primitive we don't need to allocate a 16 byte reference-counted container for every value

still-dreaming-1 commented 4 years ago

I see, only allocating the memory needed for a particular type sounds great.

still-dreaming-1 commented 4 years ago

It turns out what I said about adding types to the polyfill may be unnecessary. I'm not sure what I'm doing differently, but Psalm is now allowing me to add types to Vector objects I create, and enforcing those types. For example if Psalm were to analyze this code, it would report an issue when trying to push something that is not an int:

/**
 * @var \Ds\Vector<int>
 */
$vec = new \Ds\Vector();
$vec->push('hello');

I'm not sure why or how this works, but it does.. Anyway this is getting off topic from the issue posted here, but I wanted to let you know this already works without any changes necessary.

rtheunissen commented 4 years ago

I'm going to close this for now. With allocations incoming I'm considering completely removing allocate/capacity functionality - doesn't seem worth the added complexity at the interface level. Something to consider for v2.