`PlusOneVector` performance regression

Hi @willow-ahrens,

Here's my attempt to reproduce PlusOneVector performance regression. Here I run two short examples. One with native Julia Vectors and second one with PlusOneVector. I would expect that both should run with same performance, but the second one is x10 slower:

using Finch
using BenchmarkTools

LEN = 10000;

a = Tensor(rand(LEN, LEN - 10) * 10);
b = Tensor(rand(LEN - 10, LEN) * 10);
s = fsprand(LEN, LEN, 0.0001);

a_l = lazy(swizzle(a, 1, 2));
b_l = lazy(swizzle(b, 1, 2));
s_l = lazy(swizzle(s, 1, 2));
plan = sum(s_l[:, :, nothing] .* a_l[:, nothing, :] .* permutedims(b_l, (2, 1))[nothing, :, :], dims=3);

compute(plan);

julia> @benchmark compute(plan)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 9.678 s (0.00% GC) to evaluate,
 with a memory estimate of 2.90 MiB, over 599 allocations.

#  =======================  =======================  

new_tbl = (PlusOneVector(s.lvl.tbl[1] .- 1), PlusOneVector(s.lvl.tbl[2] .- 1));
new = SparseCOOLevel(s.lvl.lvl, s.lvl.shape, s.lvl.ptr, new_tbl);
s_new = Tensor(new);

a_l = lazy(swizzle(a, 1, 2));
b_l = lazy(swizzle(b, 1, 2));
s_l = lazy(swizzle(s_new, 1, 2));
plan = sum(s_l[:, :, nothing] .* a_l[:, nothing, :] .* permutedims(b_l, (2, 1))[nothing, :, :], dims=3);

compute(plan);

julia> @benchmark compute(plan)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 98.084 s (7.89% GC) to evaluate,
 with a memory estimate of 40.44 GiB, over 2317014172 allocations.

willow-ahrens / Finch.jl

`PlusOneVector` performance regression #565