Open jtackm opened 12 months ago
As noted in the README, AxisKeys
uses whatever search method your axiskeys arrays provide (as findfirst
method). Regular arrays do linear search in their findfirst
, that's why it is slow for many indices. AxisKeys
is composable though, so you can use any array type with faster search:
julia> using UniqueVectors
julia> A = KeyedArray(zeros(100000, 100), sid=UniqueVector(["S$i" for i in 1:100000]), oid=["O$i" for i in 1:100])
julia> @btime A[Key(sub_sids), :]
4.374 ms (25 allocations: 7.78 MiB)
I guess it could do indexin
automatically in these cases, but for now it always does findfirst
.
Oh wow, that's a great feature! This capability didn't become apparent to me when reading the README, but maybe that's just me? I had tried with Set
at one point (as a blind guess) in hope of faster lookups. Anyways, thanks a lot for the pointer.
Thanks for this great package! I find myself regularly slicing large KeyedArray matrices with large vectors of string keys (about 10% of the matrix). This is unfortunately currently slow:
On my real data it can take minutes, which is why I regularly find myself using an
indexin
workaround:The slow method becomes much faster when slicing the matrix at the beginning (
sub_sids = axiskeys(A, 1)[1:10000]
, 255.530 ms) and much slower when slicing at the end (sub_sids = axiskeys(A, 1)[end-10000:end]
, 18.904 s). The fast method is faster in all scenarios: 12.335 ms (slice beginning), 15.037 ms (slice end).Only when performing small slices (100 elements) at the beginning can I see advantages of the default method (36.438 μs vs 77.185 μs). I may be missing something, but could the current method perhaps make use of
indexin
internally?