Closed astaric closed 5 years ago
ds.scan()
does quite a bit more than just scan the main matrix. For example, it also slices through all the attributes and graphs, reorders the result according to the key
attribute, and supports scanning through only a subset of the rows/columns. That said, a 3x slowdown seems a lot. If you can figure out why it's slower, please send a pull request!
One issue might be reordering each view (which is done even if no key
was provided, which is really unnecessary). Similarly, even if no selection (items
) was requested, the code actually performs the selection on every slice. These two things could be optimized for the common case of just scanning through all the file with no selection and reordering, at the expense of making the code a bit messier.
After the change in #90, scan takes
8.68 s ± 340 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Fixed by PR #90
I had to process all rows in a loom file (108999 rows, 11930 columns) without loading the file into the memory.
I timed the execution of the following code:
and got
while doing the partitioning manually and accessing the layer directly using slices:
takes