Closed jamespayor closed 1 month ago
splitL, splitR :: (a -> Bool) -> Seq a -> (Seq a, Seq a)
https://hackage.haskell.org/package/containers-0.6.5.1/docs/Data-Sequence.html#v:spanl ?
searchL, searchR :: (Int -> a -> Int -> Bool) -> Seq a -> Maybe (Seq a, a, Seq a)
in the API's naming style, this would be span(l/r)WithIndex
, which does not exist right now.
Some functions have _WithIndex
variants (e.g., map
), some do not (e.g., filter
?)
Those do linear scans. At the time I lodged this issue, I was interested in binary search for a Seq whose elements are in some kind of sorted order.
I see. So, something like https://hackage.haskell.org/package/containers-0.6.5.1/docs/Data-Set-Internal.html#v:spanAntitone
Seq whose elements are in some kind of sorted order
then why not use Data.Set (can you give an example where that does not work)
Data.Fingertree .. constants way worse than Seq
Wait, what? Data.Sequence implementation is a finger tree.
[EDIT] The only difference (from Data.Fingertree to Data.Sequence) is that the polymorphic measure component is specialized to Int
(size)? An ideal compiler should be able to generate efficient code, and not require manual specialisation? Well, then maybe this can serve as a GHC test case.
Fwiw I'm no longer interested in pushing this issue! You can close it if it seems an unwelcome addition to the API, or unwelcome work.
Aside re Fingertree vs Seq constants: when I benchmarked back in the day there was a large difference. I bet the specialization and unpacking of the measure Int
counts for a lot in code size and indirection, but I'm not sure this explains what I saw, or if the benchmarks would still see a large discrepancy.
Data.Sequence
as a data structure supports binary search, but it looks like no methods are currently exposed for doing so.(Well, not exactly binary search, but you know what I mean -
search
primitives that takeO(log n)
time given a monotone predicate.)
Is there a source for this?
To me it seems infeasible to fit an O(log n) search on the structure.
(I know I can go and use
Data.FingerTree
for this sort of thing, but its constants are way worse thanSeq
, and I nevertheless do keep wanting to be able to searchSeq
s.)
Note that FingerTree also does not allow binary search on the elements. A FingerTree v a
allows binary search by v
, not a
, which is a crucial difference.
A Seq is a specialized FingerTree where v
is set to an Int
which is the size/count.
So the equivalent of
split :: Measured v a => (v -> Bool) -> FingerTree v a -> (FingerTree v a, FingerTree v a)
is
split :: (Int -> Bool) -> Seq a -> (Seq a, Seq a)
This does not exist today, but an equivalent effect can be achieved:
The Int -> Bool
is only applied to [0 .. size-1]
, so the split point can be located independently, without using the Seq.
Once it is located, we can use the existing splitAt :: Int -> Seq a -> (Seq a, Seq a)
to get the desired result.
Coming back to searching by a
, Seq supports indexing (in O(log n)), so it is straightforward to apply a binary search on top of that. This results in an O(log^2 n) search. I don't see a way to do better than this.
Just for fun: does the big-O change any if we use splits instead of lookups? That gets something like
log n + (log n - 1) + (log n - 2) + (log n - 3) + ... + 1 (Total of about log n terms)
I don't know what that is off the top of my head.
That would also be O(log^2 n).
1 + 2 + ... + m = m (m+1) / 2 = O(m^2).
That would also be O(log^2 n).
1 + 2 + ... + m = m (m+1) / 2 = O(m^2).
I believe you, but I don't understand your explanation. We have Theta(log^2 n) - Theta(log^n), which is certainly O(log^2 n), but that doesn't explain why the bound is tight.
I believe you, but I don't understand your explanation.
Ah, I just set m = log n.
log n + (log n - 1) + (log n - 2) + ... + 1 = 1 + 2 + ... + log n = (log n) (log n + 1) / 2 = Theta(log^2 n)
More directly, you could also apply the Master theorem to T(n) = T(n/2) + O(log n) which yields the same result.
Ah, I see now. You reversed the order of the sum. Thanks.
Data.Sequence
as a data structure supports binary search, but it looks like no methods are currently exposed for doing so.(Well, not exactly binary search, but you know what I mean -
search
primitives that takeO(log n)
time given a monotone predicate.)In particular, I'd love to have something like this in the API:
... or perhaps something more like:
An obvious use case is for querying sorted data, stored in a
Seq
. But there are also more cunning things you can do, whenever the accumulated data is monotone.(I know I can go and use
Data.FingerTree
for this sort of thing, but its constants are way worse thanSeq
, and I nevertheless do keep wanting to be able to searchSeq
s.)