Closed unbounded closed 11 months ago
Here is the code for group
.
I can't figure out where it would be quadratic.
They're both quadratic, it's just that for ;⊕∘.⇡ 30000
the slow part is timed as ⊕
, but for ;⊕∘.↯ 30000 0
it isn't.
I bet something is copying instead of reusing a CoW when building a large value. Maybe an accidental refcount increase, maybe something else. (When matching shapes / fills?)
My profile trace shows that these functions are hot, descending the call stack, for those two executions:
Primitive::run
loops::group
loops::collapse_groups
# only in ;⊕∘.⇡ 30000
Value::from_row_values
Value::append
# only in ;⊕∘.↯ 30000 0
Value::group_groups
Array::from_row_arrays_infallible
Array::append
<CowSlice as Extend>::extend
CowSlice::modify
<EcoVec as Extend>::extend
(Don't take that as gospel, I wrote it out by hand.)
Notably Value::group_groups
is hot when collecting a lot of 0s into a single uiua::Array<_>
, and Value::from_row_values
is hot when collecting a lot of unique rows into the output uiua::Value
.
I might be able to help more but frankly I think I need to leave the rest to someone who knows the interpreter.
This might also be a bug in the time reporting and I have no idea where to even start with that.
Wow, okay I found it.
It didn't have to do with extra copies or allocation.
It had to do with iterating over the elements of an array's rows.
The way I was selecting out the data that corresponded to the elements of a row was deeply inefficient.
This fix should improve not just the performance of group
, but a bunch of other stuff as well!
Thanks for finding the problem!
The
⊕ group
operator runs in O(n²) time. From the implementation I would expect it to be O(n) when indices are unique.Tested on the current main branch:
For comparison, putting everything into the same index scales fine: