Open dcherian opened 1 year ago
Maybe it would be enough to refactor Aligner.reindex_all()
like this:
def reindex_all(self) -> None:
results = []
for obj, matching_indexes in zip(
self.objects, self.objects_matching_indexes
):
if any(self.reindex[k] for k in matching_indexes):
results.append(self._reindex_one(obj, matching_indexes))
else:
results.append(obj.copy(deep=self.copy))
self.results = tuple(results)
Maybe it would be enough to refactor Aligner.reindex_all() like this
It depends on the profiling results below align
in the call stack: if most of the time is spent in Aligner.normalize_index
or Aligner.find_matching_*
, optimizing this case would require a bit more work. @dcherian could you paste another screenshot with the root set at that level, please?
Does this help? find_matching_indexes
is only 5% of that time
Yes, thanks. dataset.copy()
is 50% of the align time. Looks like the code snippet in my previous comment won't improve the performance issue much or at all? Should we rather look into if/how to optimize copy
instead?
I guess that in this case we could skip copying objects that don't need reindexing, but is it always safe or desirable? Do we have an option for this (or skip alignment altogether)?
What happened?
Here's a simple rescaling calculation:
The profile for the last line shows 30% (!!!) time spent in
align
(reallyreindex_like
) except there's nothing to reindex when only scalars are involved!This is a small example inspired by a ML pipeline where this normalization is happening very many times in a tight loop.
cc @benbovy
What did you expect to happen?
A fast path for when no reindexing needs to happen.