haskell-perf / sequences

Benchmarks for sequence data structures: lists, vectors, etc.
88 stars 11 forks source link

append scales O(n) as expected? #7

Closed jwaldmann closed 7 years ago

jwaldmann commented 7 years ago

Hi Chris -

for Data.Sequence, O(n) is definitely not expected: https://hackage.haskell.org/package/containers-0.5.10.1/docs/Data-Sequence.html#v:-62--60-

have you seen my remark at https://github.com/haskell-perf/sequences/issues/5#issuecomment-285898740 about possibly too much forcing?

chrisdone commented 7 years ago

Determining what is the right level of forcing is likely to be difficult.

If you look at the implementation of Data.Sequence's (><) append operation, it's immediately lazy:

https://hackage.haskell.org/package/containers-0.5.10.1/docs/src/Data-Sequence-Internal.html#line-1340 Nevermind, it's a newtype not a data type.

I believe its internal appendTree0 and consTree are strict:

https://hackage.haskell.org/package/containers-0.5.10.1/docs/src/Data-Sequence-Internal.html#line-1346

So perhaps the approach for Seq might be a weak-head force?

jwaldmann commented 7 years ago

"immediately lazy" - I don't see it. Seq is a newtype around Fingertree, and Fingertree has some strict components.

I agree it's difficult, and perhaps best discussed with the author(s) of that library.

chrisdone commented 7 years ago

@jwaldmann Good catch, I hadn't noticed it was a newtype.

chrisdone commented 7 years ago

Their own benchmarks do a full force (nf): https://github.com/haskell/containers/blob/master/benchmarks/Sequence.hs

The benchmark is to split and append: https://github.com/haskell/containers/blob/master/benchmarks/Sequence.hs#L181

chrisdone commented 7 years ago

With a whnf . length it seems to produce fairer results:

Name 10 100 1000 10000
Data.List 112.4 ns 894.8 ns 9.074 μs 92.44 μs
Data.Vector 580.2 ns 631.8 ns 2.134 μs 26.00 μs
Data.Vector.Unboxed 55.48 ns 111.1 ns 0.806 μs 9.009 μs
Data.Sequence 23.28 ns 23.19 ns 0.023 μs 0.024 μs

But I think constant append seems wrong. So it's probably too lazy.

treeowl commented 7 years ago

For very recent containers, seq (forcing to WHNF) should be sufficient to test <> performance. Earlier versions suspended an unreasonably large amount of work, and therefore require some forcing to time properly. rnf is far too aggressive, however. Your best approximation for old versions would likely be to extract an element at the join point, but I don't know if it's worth it.

chrisdone commented 7 years ago

In that case I can probably just update the benchmark to use a newer LTS and do a seq. I'll try this when I get home. Thanks!

On 13 Mar 2017 12:46, "David Feuer" notifications@github.com wrote:

For very recent containers, seq (forcing to WHNF) should be sufficient to test <> performance. Earlier versions suspended an unreasonably large amount of work, and require some forcing to time properly. rnf is far too aggressive, however. Your best approximation for old versions would likely be to extract an element at the join point, but I don't know if it's worth it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell-perf/sequences/issues/7#issuecomment-286097104, or mute the thread https://github.com/notifications/unsubscribe-auth/AAArC1XMxPY2aL3y8C-EIJZyS6IzBsNLks5rlTqZgaJpZM4MalMB .

chrisdone commented 7 years ago

@treeowl which version should I use?

treeowl commented 7 years ago

0.5.8.2 or later.

On Mon, Mar 13, 2017 at 9:50 AM, Chris Done notifications@github.com wrote:

@treeowl https://github.com/treeowl which version should I use?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell-perf/sequences/issues/7#issuecomment-286112119, or mute the thread https://github.com/notifications/unsubscribe-auth/ABzi_TpnIeI_m_8hsLLcy92Ysi3EtB7Lks5rlUm0gaJpZM4MalMB .

chrisdone commented 7 years ago

Upgraded to containers-0.5.10.1. Here are the new results for (<>):

Name 10 100 1000 10000
Data.List 199.1 ns 1514 ns 15.91 μs 256.7 μs
Data.Vector 500.7 ns 649.2 ns 2.204 μs 21.45 μs
Data.Vector.Unboxed 55.60 ns 113.3 ns 0.827 μs 9.474 μs
Data.Vector.Storable 43.61 ns 120.2 ns 0.824 μs 9.320 μs
Data.Sequence 79.84 ns 175.5 ns 0.301 μs 0.396 μs

Up to 1000000 for Data.Sequence:

Name 10 100 1000 10000 100000 1000000
Data.Sequence 75.59 ns 170.0 ns 326.8 ns 434.7 ns 509.5 ns 622.3 ns

That actually seems more reasonable. The lists and vectors are O(n) (it's just that vectors have smaller constant factors), and the sequence timings seem to be O(log n).

screenshot 2017-03-13 15 34 18

Very good! 👏

Thanks for your help @treeowl. We might bother you about other operations later! ;)

treeowl commented 7 years ago

For the record, the old versions would perform a constant amount of work, suspending a logarithmic amount. The new versions perform a logarithmic amount of work immediately, without (I believe) suspending any work.

treeowl commented 7 years ago

One that's going to be tricky to think about is splitAt. That performs Theta(log d) work immediately, and also suspends O(log d) work. It does enough to avoid thunk accumulation, but doesn't take the time to fully force the spine. So WHNF will be a reasonable thing to measure, but doesn't capture the full cost in all cases.

On Mon, Mar 13, 2017 at 11:22 AM, Chris Done notifications@github.com wrote:

Upgraded to containers-0.5.10.1. Here are the new results for (<>): Name 10 100 1000 10000 Data.List 199.1 ns 1514 ns 15.91 μs 256.7 μs Data.Vector 500.7 ns 649.2 ns 2.204 μs 21.45 μs Data.Vector.Unboxed 55.60 ns 113.3 ns 0.827 μs 9.474 μs Data.Vector.Storable 43.61 ns 120.2 ns 0.824 μs 9.320 μs Data.Sequence 79.84 ns 175.5 ns 0.301 μs 0.396 μs

Up to 1000000 for Data.Sequence: Name 10 100 1000 10000 100000 1000000 Data.Sequence 75.59 ns 170.0 ns 326.8 ns 434.7 ns 509.5 ns 622.3 ns

That actually seems more reasonable. The lists and vectors are O(n) (it's just that vectors have smaller constant factors), and the sequence timings are logarithmic. Very good! 👏

Thanks for your help @treeowl https://github.com/treeowl. We might bother you about other operations later! ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell-perf/sequences/issues/7#issuecomment-286140486, or mute the thread https://github.com/notifications/unsubscribe-auth/ABzi_VtqubBagO7p6XbSo_ZmhUUyAuJ-ks5rlV9JgaJpZM4MalMB .