Quadratic time complexity of group

unbounded commented 12 months ago

The ⊕ group operator runs in O(n²) time. From the implementation I would expect it to be O(n) when indices are unique.

Tested on the current main branch:

$ cat main.ua
;⊕∘.⇡ 100
;⊕∘.⇡ 1000
;⊕∘.⇡ 10000
;⊕∘.⇡ 100000

$ uiua run --time-instrs  | grep ⊕
  ⏲  0.02ms - ⊕
  ⏲  0.21ms - ⊕
  ⏲  14.25ms - ⊕
  ⏲  1320.36ms - ⊕

For comparison, putting everything into the same index scales fine:

$ cat main.ua
;⊕∘.↯ 1000 0
;⊕∘.↯ 10000 0
;⊕∘.↯ 100000 0
$ uiua run --time-instrs  | grep ⊕
  ⏲  0.01ms - ⊕
  ⏲  0.00ms - ⊕
  ⏲  0.00ms - ⊕

kaikalii commented 12 months ago

Here is the code for group. I can't figure out where it would be quadratic.

wirelyre commented 11 months ago

They're both quadratic, it's just that for ;⊕∘.⇡ 30000 the slow part is timed as ⊕, but for ;⊕∘.↯ 30000 0 it isn't.

I bet something is copying instead of reusing a CoW when building a large value. Maybe an accidental refcount increase, maybe something else. (When matching shapes / fills?)

My profile trace shows that these functions are hot, descending the call stack, for those two executions:

Primitive::run
loops::group
loops::collapse_groups

# only in ;⊕∘.⇡ 30000
    Value::from_row_values
    Value::append

# only in ;⊕∘.↯ 30000 0
    Value::group_groups
    Array::from_row_arrays_infallible

Array::append
<CowSlice as Extend>::extend
CowSlice::modify
<EcoVec as Extend>::extend

(Don't take that as gospel, I wrote it out by hand.)

Notably Value::group_groups is hot when collecting a lot of 0s into a single uiua::Array<_>, and Value::from_row_values is hot when collecting a lot of unique rows into the output uiua::Value.

I might be able to help more but frankly I think I need to leave the rest to someone who knows the interpreter.

This might also be a bug in the time reporting and I have no idea where to even start with that.

kaikalii commented 11 months ago

Wow, okay I found it. It didn't have to do with extra copies or allocation. It had to do with iterating over the elements of an array's rows. The way I was selecting out the data that corresponded to the elements of a row was deeply inefficient. This fix should improve not just the performance of group, but a bunch of other stuff as well! Thanks for finding the problem!

uiua-lang / uiua

Quadratic time complexity of group #296