mccolljr / segvec

SegVec data structure for rust. Similar to Vec, but allocates memory in chunks of increasing size.
MIT License
35 stars 5 forks source link

Memconfig #21

Closed cehteh closed 1 year ago

cehteh commented 1 year ago

This should conclude the current work on 'memconfig'.

cehteh commented 1 year ago

On 2023-06-12 10:31, Jacob Ryan McCollum wrote:

@mccolljr commented on this pull request.

One question I wanted to get your thoughts on?

@@ -34,7 +30,7 @@ pub trait MemConfig {
}

/// Linear growth, all segments have the same (FACTOR) length. -pub struct Linear; +pub struct Linear;

My gut feeling is that this is large for a default value, was there any specific reason for this selection?

Linear really benefits from bigger blocks. I wanted to make a default here that gives the best performance. Still things may vary widely depending on size_of::, number of elements, access patterns and so on. Smaller numbers of elements already well covered by the Exponential policy, while that may become pretty wasteful when one needs huge vectors: Exponential wastes at average 25%, when you come to the point where you have to allocate vectors that are many gigabytes sized then this may lead to oom sooner than expected (even worse with std Vec).

Also look at: https://git.pipapo.org/cehteh/cowstr/src/branch/master/src/rcstring.rs#L623

such an allocation policy might be benefitful for SegVec eventually.

-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/21#pullrequestreview-1475464292 You are receiving this because you authored the thread.

Message ID: @.***>

cehteh commented 1 year ago

On 2023-06-12 10:31, Jacob Ryan McCollum wrote:

@mccolljr commented on this pull request.

One question I wanted to get your thoughts on?

@@ -34,7 +30,7 @@ pub trait MemConfig { }

/// Linear growth, all segments have the same (FACTOR) length. -pub struct Linear; +pub struct Linear;

My gut feeling is that this is large for a default value, was there any specific reason for this selection?

lets found that on some quick benchmark:

push values/Linear<1> time: [66.381 ms 66.469 ms 66.562 ms] push values/Linear<2> time: [29.054 ms 29.256 ms 29.492 ms] push values/Linear<4> time: [16.457 ms 16.492 ms 16.529 ms] push values/Linear<8> time: [10.500 ms 10.522 ms 10.545 ms] push values/Linear<16> time: [7.1926 ms 7.2140 ms 7.2362 ms] push values/Linear<32> time: [6.1482 ms 6.1730 ms 6.1913 ms] push values/Linear<64> time: [4.7184 ms 4.7729 ms 4.8357 ms] push values/Linear<128> time: [4.8811 ms 4.8821 ms 4.8832 ms] push values/Linear<256> time: [4.6083 ms 4.6115 ms 4.6159 ms] push values/Linear<512> time: [3.8971 ms 3.9029 ms 3.9102 ms] push values/Linear<1024>time: [3.8384 ms 3.8393 ms 3.8403 ms]

Of course values will likely vary a lot for bigger T's than i32 the bench uses

-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/21#pullrequestreview-1475464292 You are receiving this because you authored the thread.

Message ID: @.***>

mccolljr commented 1 year ago

I'm not surprised that insertion is more efficient the larger you make the linear segment size, as it reduces the number of allocations needed to insert N elements. I guess my question regarding the default FACTOR for the Linear memory config type is... is a default necessary? It seems to me that the optimal value for any given use case is going to depend heavily on the specifics of that use case. It seems difficult to choose a "sane" default that is widely applicable, so maybe it is better to abstain from suggesting a default at all?

cehteh commented 1 year ago

On 2023-06-12 12:20, Jacob Ryan McCollum wrote:

I'm not surprised that insertion is more efficient the larger you make the linear segment size, as it reduces the number of allocations needed to insert N elements. I guess my question regarding the default FACTOR for the Linear memory config type is... is a default necessary? It seems to me that the optimal value for any given use case is going to depend heavily on the specifics of that use case. It seems difficult to choose a "sane" default that is widely applicable, so maybe it is better to abstain from suggesting a default at all?

Feel free to remove the defaults. I was not 100% sure about having defaults either.

-- Reply to this email directly or view it on GitHub: https://github.com/mccolljr/segvec/pull/21#issuecomment-1587938990 You are receiving this because you authored the thread.

Message ID: @.***>

mccolljr commented 1 year ago

@cehteh OK. I don't want to hold up merging this over default values so I will merge as-is, but I will probably change or remove the defaults before publishing a new released version