hrldcpr / pcollections

A Persistent Java Collections Library
Other
765 stars 79 forks source link

sublist produces stack-overflow on large lists #74

Closed Groostav closed 5 years ago

Groostav commented 5 years ago

when calling sublist for reasonably large list, result is stack-overflow. The implementation is recursive with a stack-depth that O(n) on the number of elements => large lists produce stack-overflow errors.

to repro:

  1. create an empty TreePVector
  2. add an element to it 20,000 times
  3. call sublist on the second half

expected: fast operation on shared subcomponents produces a wrapper on new list actual: stack-overflow on recursion of sublist:

   java.lang.StackOverflowError
     at kotlin.shaded.org.pcollections.IntTree.changeKeysBelow(IntTree.java:190)
     at kotlin.shaded.org.pcollections.IntTree.changeKeysBelow(IntTree.java:190)
     at kotlin.shaded.org.pcollections.IntTree.changeKeysBelow(IntTree.java:190)
     at kotlin.shaded.org.pcollections.IntTree.changeKeysBelow(IntTree.java:190)
     at kotlin.shaded.org.pcollections.IntTree.changeKeysBelow(IntTree.java:190)
     at kotlin.shaded.org.pcollections.IntTree.changeKeysBelow(IntTree.java:190)
     at kotlin.shaded.org.pcollections.IntTree.changeKeysBelow(IntTree.java:190)
     at kotlin.shaded.org.pcollections.IntTree.changeKeysBelow(IntTree.java:190)
     at kotlin.shaded.org.pcollections.IntTree.changeKeysBelow(IntTree.java:190)
     at kotlin.shaded.org.pcollections.IntTree.changeKeysAbove(IntTree.java:160)
     at kotlin.shaded.org.pcollections.IntTreePMap.withKeysChangedAbove(IntTreePMap.java:85)
     at kotlin.shaded.org.pcollections.TreePVector.minus(TreePVector.java:120)
     at kotlin.shaded.org.pcollections.TreePVector.subList(TreePVector.java:96)
     at kotlin.shaded.org.pcollections.TreePVector.subList(TreePVector.java:96)
     at kotlin.shaded.org.pcollections.TreePVector.subList(TreePVector.java:96)
     at kotlin.shaded.org.pcollections.TreePVector.subList(TreePVector.java:96)
     at kotlin.shaded.org.pcollections.TreePVector.subList(TreePVector.java:96)
     at kotlin.shaded.org.pcollections.TreePVector.subList(TreePVector.java:96)
     at kotlin.shaded.org.pcollections.TreePVector.subList(TreePVector.java:96)
     at kotlin.shaded.org.pcollections.TreePVector.subList(TreePVector.java:96)
     at kotlin.shaded.org.pcollections.TreePVector.subList(TreePVector.java:96)
        ...

kotlin SSCCE:


    @Test fun `when using a TreePVector should not explode`(){
        var list = TreePVector.empty<Int>()

        for(i in 0 until 20_000){
            list = list.plus(i)
        }

        val res = list.subList(10_000, 20_000)

        require(res.size == 10_000) { "result not correctly shaped" }
    }
hrldcpr commented 5 years ago

Thanks for the report! I wish Java had tail call optimization, oh well.

Should be an easy fix to convert the recursion to a loop—feel free to submit a pull request but if not I'll try to get to it soon

hrldcpr commented 5 years ago

Pushed a fix! Also fixed ConsPStack.subList. I'll do a version bump and release soon.

I think there are probably a few other places where we should convert recursive implementations to iterative ones, I'll make a separate issue about that

hrldcpr commented 5 years ago

v3.0.4 is released with the fix, thanks again for the bug report

Groostav commented 5 years ago

@hrldcpr would it be possible to do an optimization to reduce gc thrashing by navigating down to the required node before returning a new PTreeVector?

hrldcpr commented 5 years ago

Yeah sure, good idea! f58c25f makes this change. v3.0.5 is being released to include it. There will still be numerous calls to new IntTree but that's harder to avoid.

(Also created #76 to track making this optimization elsewhere)