Refactor - ArrayManager overview issue

jorisvandenbossche commented 3 years ago

Related to the discussion in https://github.com/pandas-dev/pandas/issues/10556, and following up on the mailing list discussion "A case for a simplified (non-consolidating) BlockManager with 1D blocks" (archive).

Initial proof of concept for a non-consolidating "ArrayManager" (storing the columns of a DataFrame as a list of 1D arrays instead of blocks) is merged in https://github.com/pandas-dev/pandas/pull/36010.

This issue is meant to track the required follow-up work items to get this to a more feature-complete implementation.

Functionality: get all tests passing
- There are big chunks of tests failing because some larger sets of functionality is not yet implemented or relying on BlockManager internals (this last aspect is also covered in https://github.com/pandas-dev/pandas/issues/34669). Those bigger topics are:
- [x] quantile / describe related (ArrayManager.quantile is not yet implemented) -> https://github.com/pandas-dev/pandas/pull/40189
- [x] equals related (ArrayManager.equals is not yet implemented) -> #39721
- [x] groupby related tests (there are still a few parts of groupby that directly uses the blocks) -> #39885, #40050
- [x] concat related (internals/concat.py only deals with the simple case when no reindexing is needed for ArrayManager at the moment, the full functionality (similarly to what concatenate_block_managers / the JoinUnits now cover) still needs to be implemented) -> https://github.com/pandas-dev/pandas/pull/39612
- [ ] indexing related (some of the ArrayManager methods like setitem, iset, insert are not yet fully implementated for all corner cases + get indexing tests passing)
- [ ] IO related:
  - [x] JSON (https://github.com/pandas-dev/pandas/issues/27164, fixed in #41809
  - [ ] pytables code still relies on block internals
- In addition, the ArrayManager currently also uses an "apply_with_block" fallback for things that are right now implemented directly on the Block classes. Long term, all those cases should also be refactored so that the core functionality of the specific function can be shared between ArrayManager and BlockManager, without directly relying on the Block classes.
- [ ] Some examples of this: replace, where, interpolate, shift, diff, downcast, putmask, ... (those could all be refactored one at a time).
- There will also be tests that either are 1) BlockManager specific (using block internals to test) or 2) testing behaviour specific to DataFrames with BlockManager (we will probably want to change some aspects about eg copy/view, setitem-like ops, etc. Those changes all need to discussed separately of course, but might also require some skipped tests initially). Such tests can be skipped with eg @td.skip_array_manager_invalid_test.
Design questions:
- [ ] What to do with Series, which now is a SingleBlockManager inheriting from BlockManager (should we also have a "SingleArrayManager"?) -> #40152
- [ ] ... (probably more items will come up) ...
Performance
- Currently, I didn't yet look at performance (I only ran a few of the ASV benchmarks, see top post of https://github.com/pandas-dev/pandas/pull/36010). I also think that we should first focus on getting a larger part of the functionality working (which will also make it easier to run benchmarks), but afterwards we will need to identify the different areas where performance improvements are needed.

jorisvandenbossche commented 3 years ago

One design question that was still left open is what to do with Series, for which we currently have a SingleBlockManager. The two obvious options I can think of:

Do something similar and also have a "SingleArrayManager" class
Directly store the array on the Series without involving a manager object

A while ago I thought the second option could be an attractive simplification (because in the end, a Series "just" consists of an array and an index, so why needing a manager?). But that was probably a bit naive ;) The Manager still does quite some things, and moreover, doing a SingleArrayManager keeps the changes more limited (we can still see later if getting rid of Single(Block/Array)Manager is an option we want to explore, independent from the BlockManager vs ArrayManager debate) and for implementing certain features consistently between Series and DataFrame, having both with an underlying manager is probably useful.

Now, for the actual SingleArrayManager, some thoughts:

For SingleBlockManager, that's actually a subclass of BlockManager. But many methods of the BlockManager are not written to work with SingleBlockManager, which means you have quite some methods on the SingleBlockManager that are never used / would error if used. Which I don't find a very nice design pattern. An alternative for SingleArrayManager could be to not subclass ArrayManager itself (but only a base Manager class). We could of course still have a mixin to share those parts that can be shared.
I was wondering if we could actually "just" reuse the ArrayManager for Series as well, in which case it would store a single array (list of arrays with length 1). So the underlying storage would be the same as for a DataFrame of 1 column. But, I suppose that storing the name of the Series as a len-1 Index probably gives quite some overhead (and in addition we already have many places that explicitly does "name resolution" to determine the name of the resulting series, so that might give bigger changes)

I am currently testing out the approach of a separate SingleArrayManager class to see what is needed to implement it fully.

cc @jreback @jbrockmendel @TomAugspurger

jorisvandenbossche commented 3 years ago

Posting here some results from parts of the benchmark suite. I always ran the benchmark on a commit where I changed the default to ArrayManager (HEAD) vs the previous commit with the normal default of BlockManager (HEAD~1). So using the diff formatting to get some color: green is slower and red is faster for the ArrayManager. The "am-benchmarks" branch I am using is currently master + the 2 perf-related PRs I opened today + the commit to change the default.

First batch of results are benchmarks related to reshaping / take / reindex:

join_merge

$ asv continuous -f 1.01 -b join_merge HEAD~1 HEAD

```diff before after ratio [78aa7f40] [ea965b27]

+ 35.0±3ms 78.9±0.3ms 2.26 join_merge.ConcatDataFrames.time_c_ordered(0, False) + 35.0±3ms 78.7±0.2ms 2.25 join_merge.ConcatDataFrames.time_c_ordered(0, True) + 39.5±5ms 78.5±0.4ms 1.99 join_merge.ConcatDataFrames.time_f_ordered(0, True) + 39.9±4ms 78.8±0.2ms 1.98 join_merge.ConcatDataFrames.time_f_ordered(0, False) - 238±0.2ms 233±0.2ms 0.98 join_merge.MergeAsof.time_by_object('nearest', None) - 239±0.3ms 234±0.2ms 0.98 join_merge.MergeAsof.time_by_object('nearest', 5) - 171±0.1ms 166±0.2ms 0.98 join_merge.MergeAsof.time_by_object('forward', 5) - 170±0.2ms 166±0.2ms 0.97 join_merge.MergeAsof.time_by_object('forward', None) - 227±5ms 218±0.6ms 0.96 join_merge.MergeAsof.time_by_int('nearest', 5) - 226±5ms 218±0.6ms 0.96 join_merge.MergeAsof.time_by_int('nearest', None) - 659±2ms 630±1ms 0.96 join_merge.MergeCategoricals.time_merge_object - 162±4ms 154±0.3ms 0.95 join_merge.MergeAsof.time_by_int('forward', 5) - 249±2μs 237±3μs 0.95 join_merge.Concat.time_concat_empty_left(0) - 83.8±0.5ms 79.4±0.1ms 0.95 join_merge.MergeAsof.time_by_object('backward', None) - 84.1±0.5ms 79.6±0.3ms 0.95 join_merge.MergeAsof.time_by_object('backward', 5) - 15.4±0.3ms 14.6±0.5ms 0.95 join_merge.Join.time_join_dataframe_index_shuffle_key_bigger_sort(True) - 75.2±1ms 71.1±0.9ms 0.95 join_merge.MergeAsof.time_by_int('backward', None) - 163±4ms 154±0.1ms 0.95 join_merge.MergeAsof.time_by_int('forward', None) - 76.1±1ms 71.4±0.6ms 0.94 join_merge.MergeAsof.time_by_int('backward', 5) - 5.25±0.04ms 4.91±0.05ms 0.94 join_merge.Merge.time_merge_dataframe_integer_2key(False) - 24.0±0.3ms 22.3±0.3ms 0.93 join_merge.Join.time_join_dataframe_index_multi(True) - 78.5±0.7ms 71.5±0.7ms 0.91 join_merge.Concat.time_concat_series(1) - 363±1μs 321±3μs 0.89 join_merge.Concat.time_concat_mixed_ndims(0) - 587±1μs 509±3μs 0.87 join_merge.Concat.time_concat_mixed_ndims(1) - 260±0.4ms 223±0.6ms 0.86 join_merge.MergeCategoricals.time_merge_cat - 84.6±0.5ms 71.2±0.3ms 0.84 join_merge.MergeOrdered.time_merge_ordered - 1.22±0.03s 1.02±0s 0.84 join_merge.JoinIndex.time_left_outer_join_index - 795±2ms 661±8ms 0.83 join_merge.I8Merge.time_i8merge('right') - 743±2ms 607±8ms 0.82 join_merge.I8Merge.time_i8merge('left') - 157±2μs 127±2μs 0.81 join_merge.Concat.time_concat_empty_left(1) - 724±9ms 587±9ms 0.81 join_merge.I8Merge.time_i8merge('inner') - 725±1ms 588±8ms 0.81 join_merge.I8Merge.time_i8merge('outer') - 156±2μs 126±2μs 0.81 join_merge.Concat.time_concat_empty_right(1) - 2.36±0.01ms 1.86±0.01ms 0.79 join_merge.Merge.time_merge_dataframe_integer_key(True) - 19.6±0.2ms 15.1±0.3ms 0.77 join_merge.Join.time_join_dataframe_index_multi(False) - 19.1±0.06ms 14.7±0.08ms 0.77 join_merge.MergeAsof.time_on_int('nearest', 5) - 2.00±0.01ms 1.53±0.01ms 0.77 join_merge.Merge.time_merge_dataframe_integer_key(False) - 18.5±0.03ms 14.1±0.1ms 0.76 join_merge.MergeAsof.time_on_int32('nearest', None) - 18.8±0.05ms 14.3±0.04ms 0.76 join_merge.MergeAsof.time_on_uint64('nearest', 5) - 18.9±0.05ms 14.4±0.07ms 0.76 join_merge.MergeAsof.time_on_int32('nearest', 5) - 348±4μs 264±3μs 0.76 join_merge.Append.time_append_homogenous - 18.6±0.06ms 14.1±0.02ms 0.76 join_merge.MergeAsof.time_on_int('nearest', None) - 18.3±0.2ms 13.8±0.04ms 0.75 join_merge.MergeAsof.time_on_uint64('nearest', None) - 15.4±0.08ms 10.9±0.04ms 0.71 join_merge.MergeAsof.time_on_int('forward', 5) - 15.2±0.04ms 10.6±0.09ms 0.70 join_merge.MergeAsof.time_on_int32('forward', 5) - 14.9±0.04ms 10.4±0.03ms 0.70 join_merge.MergeAsof.time_on_int32('forward', None) - 15.0±0.06ms 10.5±0.05ms 0.70 join_merge.MergeAsof.time_on_int('forward', None) - 14.9±0.1ms 10.4±0.04ms 0.70 join_merge.MergeAsof.time_on_uint64('forward', 5) - 14.6±0.06ms 10.2±0.1ms 0.70 join_merge.MergeAsof.time_on_uint64('forward', None) - 14.5±0.03ms 10.1±0.06ms 0.69 join_merge.MergeAsof.time_on_int32('backward', 5) - 14.4±0.08ms 9.90±0.04ms 0.69 join_merge.MergeAsof.time_on_int32('backward', None) - 5.12±0.02ms 3.52±0.01ms 0.69 join_merge.Join.time_join_dataframes_cross(True) - 14.0±0.2ms 9.55±0.08ms 0.68 join_merge.MergeAsof.time_on_int('backward', 5) - 13.7±0.05ms 9.26±0.05ms 0.68 join_merge.MergeAsof.time_on_uint64('backward', 5) - 13.8±0.09ms 9.24±0.04ms 0.67 join_merge.MergeAsof.time_on_int('backward', None) - 4.90±0.03ms 3.27±0.03ms 0.67 join_merge.Join.time_join_dataframes_cross(False) - 13.4±0.08ms 8.91±0.03ms 0.67 join_merge.MergeAsof.time_on_uint64('backward', None) - 12.9±0.1ms 8.55±0.02ms 0.66 join_merge.Join.time_join_dataframe_index_shuffle_key_bigger_sort(False) - 13.2±0.1ms 8.66±0.04ms 0.66 join_merge.Join.time_join_dataframe_index_single_key_bigger(False) - 12.4±0.3ms 7.73±0.03ms 0.62 join_merge.Join.time_join_dataframe_index_single_key_small(False) - 857±20μs 513±9μs 0.60 join_merge.Append.time_append_mixed - 11.6±0.04ms 6.82±0.03ms 0.59 join_merge.Concat.time_concat_small_frames(1) - 17.7±0.1ms 10.1±0.02ms 0.57 join_merge.Concat.time_concat_small_frames(0) - 789±5ms 413±1ms 0.52 join_merge.Merge.time_merge_dataframes_cross(False) - 789±2ms 413±0.7ms 0.52 join_merge.Merge.time_merge_dataframes_cross(True) - 50.4±0.9ms 255±0.8μs 0.01 join_merge.ConcatDataFrames.time_f_ordered(1, False) - 49.4±3ms 215±1μs 0.00 join_merge.ConcatDataFrames.time_f_ordered(1, True) - 77.3±40ms 256±1μs 0.00 join_merge.ConcatDataFrames.time_c_ordered(1, False) - 76.8±30ms 214±2μs 0.00 join_merge.ConcatDataFrames.time_c_ordered(1, True) SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. PERFORMANCE DECREASED. ```

All benchmarks are on par or faster using ArrayManager, except for the concat( , axis=0) cases. Many of the join/merge/merge_asof cases show some speedup.
Some of the concat(.., axis=1) benchmarks give a huge difference (factor "0.00" for after), but this is largely explained by copy (the BlockManager based method copies in this case of no reindexing, while the current ArrayManager version doesn't do that (yet)). If I add a copy, it becomes a more modest speedup of 2-3x (instead of >100x).
The concat(.., axis=0) case that shows a 2x slowdown with ArrayManager is the expected case, I think. That specific benchmark is using a DataFrame with a single float dtype with 200 columns (single block) and concatting this 20 times, so concatenating 20 2D arrays is always going to be faster as concatenating 200 times 20 1D arrays (the benchmark case has 200 columns, and does pd.concat([df] * 20, axis=0)). From profiling the operation, almost all time is spent in the actual numpy concatenation routine (for both cases), so I don't expect there is much room for improvement here.

reindex

$ asv continuous -f 1.01 -b reindex HEAD~1 HEAD

```diff before after ratio [78aa7f40] [ea965b27]

+ 16.4±0.6ms 95.1±40ms 5.81 frame_methods.Reindex.time_reindex_both_axes + 41.0±0.3ms 103±0.5ms 2.52 frame_methods.Reindex.time_reindex_axis1_missing + 7.04±0.2ms 16.7±0.3ms 2.37 frame_methods.Reindex.time_reindex_upcast + 3.70±0.05ms 6.86±0.03ms 1.85 frame_methods.Reindex.time_reindex_axis0 + 789±6μs 1.13±0.01ms 1.43 reindex.ReindexMethod.time_reindex_method('pad', ) + 935±9μs 1.28±0.01ms 1.37 reindex.ReindexMethod.time_reindex_method('pad', ) + 733±6μs 977±20μs 1.33 reindex.ReindexMethod.time_reindex_method('backfill', ) + 741±30μs 985±30μs 1.33 reindex.LevelAlign.time_align_level + 770±40μs 999±20μs 1.30 reindex.LevelAlign.time_reindex_level + 251±2μs 324±20μs 1.29 reindex.Fillna.time_reindexed('pad') + 879±5μs 1.10±0.01ms 1.26 reindex.ReindexMethod.time_reindex_method('backfill', ) + 303±2μs 377±3μs 1.24 reindex.Reindex.time_reindex_dates + 198±2μs 226±3μs 1.14 reindex.Fillna.time_float_32('pad') + 288±4μs 319±6μs 1.11 reindex.Fillna.time_reindexed('backfill') - 2.51±0.04ms 2.44±0.01ms 0.97 reindex.DropDuplicates.time_frame_drop_dups_bool(False) - 412±4μs 368±2μs 0.89 reindex.DropDuplicates.time_series_drop_dups_string(False) - 356±2μs 309±0.9μs 0.87 reindex.DropDuplicates.time_series_drop_dups_int(False) - 5.91±0.06ms 5.12±0.07ms 0.86 reindex.DropDuplicates.time_frame_drop_dups_int(True) - 264±1μs 219±0.4μs 0.83 reindex.DropDuplicates.time_series_drop_dups_string(True) - 245±2μs 200±0.9μs 0.82 reindex.DropDuplicates.time_series_drop_dups_int(True) - 525±4μs 242±2μs 0.46 reindex.Reindex.time_reindex_columns - 28.6±0.07ms 640±10μs 0.02 frame_methods.Reindex.time_reindex_axis1 ```

Reindexing the columns can be a lot faster with ArrayManager (the two last cases with the biggest speedup), or slower when the reindexing introduces many "null" columns (the two first cases with the biggest slowdown). Personally, I don't worry too much about this case of reindexing with many non-existing columns.
Another benchmark with a slowdown is reindexing the rows (frame_methods.Reindex.time_reindex_axis0) with a slowdown factor of 1.8. But since this is reindexing a wide (1000 columns) and homogenously dtyped (1 block) DataFrame, I think this is a "worst case", and so an acceptable slowdown IMO.

reshape

$ asv continuous -f 1.01 -b reshape HEAD~1 HEAD

```diff before after ratio [c9628cbd] [707df345]

+ 93.8±2ms 1.10±0.02s 11.78 reshape.Unstack.time_without_last_row('int') + 43.2±0.1ms 205±10ms 4.75 reshape.Unstack.time_full_product('int') + 4.13±0.03ms 9.47±0.06ms 2.30 reshape.SimpleReshape.time_stack + 2.10±0.02ms 2.70±0.02ms 1.29 reshape.SimpleReshape.time_unstack - 268±0.5μs 263±1μs 0.98 reshape.Explode.time_explode(100, 10) - 309±2ms 297±1ms 0.96 reshape.WideToLong.time_wide_to_long_big - 26.5±0.4ms 24.7±0.1ms 0.93 reshape.Crosstab.time_crosstab - 87.4±0.3ms 78.9±0.9ms 0.90 reshape.Crosstab.time_crosstab_normalize_margins - 25.8±0.4ms 23.1±0.2ms 0.89 reshape.Crosstab.time_crosstab_values - 3.24±0.02ms 2.44±0.01ms 0.75 reshape.Melt.time_melt_dataframe - 353±0.9ms 193±2ms 0.55 reshape.Unstack.time_without_last_row('category') - 349±0.7ms 64.3±2ms 0.18 reshape.Unstack.time_full_product('category') ```

Many of the cases didn't show a difference for reshape.py (eg includes cases like pd.cut, so that's expected), and for unstack it shows a bit varying results: some cases are faster, some are slower (it depends on the number of columns, different types, whether the BlockManager version could take the fast implementation or not, etc).
The 2 cases with the biggest slowdown (reshape.Unstack with int dtype) creates a DataFrame with a shape of 100 rows × 100000 columns (so a very wide DataFrame, and thus a slowdown is expected / acceptable IMO). For the "full product" case (no missing values get introduced), the ArrayManager is "only" 5 times slower compared to reshaping the single block. When missing values get introduced, the difference gets bigger, but that is potentially something that can be further improved. reshape.SimpleReshape.time_unstack is similar, but there the slowdown is much smaller (x1.29), because it produces a less wide dataframe (100 rows × 400 columns) and uses floats (in which case we don't need to check for upcasting if nulls get introduced).
The stack case can be probably be optimized, as currently that goes through a conversion of the DataFrame to a numpy arrays (which is more expensive for non-single Block ArrayManager).

jbrockmendel commented 3 years ago

Are there any consistent benchmark results that dont make sense to you? (xref #40066)

If I add a copy, it becomes a more modest speedup of 2-3x (instead of >100x).

can you mark the ones that are significantly affected by this?

Big picture, can you give an update on what is left to implement (e.g. im guessing json and pytables)

jorisvandenbossche commented 3 years ago

Are there any consistent benchmark results that dont make sense to you?

I didn't see anything particularly strange, up to now. (in the next batch about reductions, I noticed some strange speed-ups, that probably point to an issue in the BlockManager implementation, but comment about that in detail in the next post, probably tomorrow)

If I add a copy, it becomes a more modest speedup of 2-3x (instead of >100x).

can you mark the ones that are significantly affected by this?

Only the onces doing a simple concat(.., axis=1), so the 4ConcatDataFramesbenchmarks (the 4 that show the biggest speedup, so at the bottom of the list). Other benchmarks likemerge` that also have a speedup don't have that issue, since those already did a reindexing (and thus a copy) anyway. But note that, if we decide to go with copy-on-write, we don't need to add a copy to preserve behaviour.

Big picture, can you give an update on what is left to implement (e.g. im guessing json and pytables)

Yes, with the several indexing related PRs I opened today, we are running most of the tests. And the biggest chunks of skipped tests are related to JSON and PyTables IO. Related to IO, there is also Parquet (but that's depending on downstream libraries) and pickling. Next to that, there are still various smaller corner cases to fix (the TODO(ArrayManager) skipped tests), but in the grand scheme of things, the ones that are left are relatively minor, I think. I also need to get back to concat(.., axis=0) with reindexing (#39612), and there are still some usages of apply_with_block that ideally would be cleaned up (although that doesn't prevent AraryManager being usable). For performance related issues, I mainly need to get back to the element-wise ops I started with a while ago (#39772, and related PRs).

jorisvandenbossche commented 3 years ago

Some more benchmark results:

groupby

$ asv continuous -f 1.01 -b groupby HEAD~1 HEAD

```diff before after ratio [c9628cbd] [707df345]

+ 3.83±0.02ms 61.6±0.2ms 16.07 groupby.GroupManyLabels.time_sum(1000) + 8.23±0.05ms 21.4±0.06ms 2.60 groupby.Apply.time_scalar_function_single_col(4) + 23.9±0.1ms 59.6±0.2ms 2.49 groupby.Apply.time_scalar_function_multi_col(4) + 51.8±0.4ms 58.4±0.5ms 1.13 groupby.GroupByCythonAgg.time_frame_agg('float64', 'max') + 267±1μs 300±2μs 1.12 groupby.GroupByMethods.time_dtype_as_field('datetime', 'head', 'direct') + 266±2μs 299±2μs 1.12 groupby.GroupByMethods.time_dtype_as_field('datetime', 'head', 'transformation') + 278±0.8μs 313±0.9μs 1.12 groupby.GroupByMethods.time_dtype_as_field('datetime', 'tail', 'transformation') + 279±1μs 313±2μs 1.12 groupby.GroupByMethods.time_dtype_as_field('datetime', 'tail', 'direct') + 6.54±0.1ms 7.24±0.09ms 1.11 groupby.Apply.time_scalar_function_single_col(5) + 66.9±0.6ms 70.1±1ms 1.05 groupby.GroupByCythonAgg.time_frame_agg('float64', 'var') - 273±1μs 271±2μs 0.99 groupby.GroupByMethods.time_dtype_as_field('float', 'tail', 'direct') - 28.0±0.07ms 27.7±0.2ms 0.99 groupby.AggEngine.time_series_cython(False) - 114±0.2ms 112±0.7ms 0.98 groupby.GroupByMethods.time_dtype_as_group('datetime', 'unique', 'direct') - 517±2μs 509±0.7μs 0.98 groupby.GroupByMethods.time_dtype_as_field('object', 'bfill', 'transformation') - 447±4μs 439±1μs 0.98 groupby.GroupByMethods.time_dtype_as_field('int', 'median', 'direct') - 363±2μs 356±1μs 0.98 groupby.GroupByMethods.time_dtype_as_group('int', 'ffill', 'direct') - 505±3μs 495±2μs 0.98 groupby.GroupByMethods.time_dtype_as_field('int', 'nunique', 'transformation') - 363±3μs 356±0.9μs 0.98 groupby.GroupByMethods.time_dtype_as_group('int', 'ffill', 'transformation') - 860±4μs 844±0.9μs 0.98 groupby.GroupByMethods.time_dtype_as_field('object', 'value_counts', 'transformation') - 519±1μs 509±2μs 0.98 groupby.GroupByMethods.time_dtype_as_field('object', 'ffill', 'transformation') - 477±2μs 467±1μs 0.98 groupby.GroupByMethods.time_dtype_as_field('object', 'nunique', 'direct') - 333±2μs 326±0.7μs 0.98 groupby.GroupByMethods.time_dtype_as_field('float', 'bfill', 'transformation') - 356±1μs 348±1μs 0.98 groupby.GroupByMethods.time_dtype_as_group('float', 'bfill', 'direct') - 545±2μs 533±0.3μs 0.98 groupby.GroupByMethods.time_dtype_as_field('float', 'quantile', 'transformation') - 461±2μs 451±1μs 0.98 groupby.GroupByMethods.time_dtype_as_field('float', 'nunique', 'transformation') - 6.59±0.02ms 6.44±0.02ms 0.98 groupby.Transform.time_transform_multi_key3 - 333±2μs 325±0.7μs 0.98 groupby.GroupByMethods.time_dtype_as_field('float', 'ffill', 'direct') - 478±1μs 466±0.7μs 0.98 groupby.GroupByMethods.time_dtype_as_field('object', 'nunique', 'transformation') - 561±5μs 548±1μs 0.98 groupby.GroupByMethods.time_dtype_as_field('int', 'quantile', 'direct') - 584±3μs 570±1μs 0.98 groupby.GroupByMethods.time_dtype_as_field('float', 'rank', 'direct') - 378±3μs 369±0.9μs 0.98 groupby.GroupByMethods.time_dtype_as_field('object', 'first', 'direct') - 569±3μs 555±1μs 0.98 groupby.GroupByMethods.time_dtype_as_group('float', 'quantile', 'direct') - 358±1μs 349±0.5μs 0.98 groupby.GroupByMethods.time_dtype_as_group('float', 'ffill', 'transformation') - 525±2μs 512±0.7μs 0.97 groupby.GroupByMethods.time_dtype_as_group('int', 'nunique', 'direct') - 449±3μs 438±0.6μs 0.97 groupby.GroupByMethods.time_dtype_as_group('int', 'median', 'direct') - 440±6μs 428±2μs 0.97 groupby.GroupByMethods.time_dtype_as_group('int', 'sum', 'transformation') - 367±1μs 357±2μs 0.97 groupby.GroupByMethods.time_dtype_as_group('object', 'nunique', 'direct') - 531±1μs 517±2μs 0.97 groupby.GroupByMethods.time_dtype_as_field('datetime', 'quantile', 'direct') - 436±3μs 423±0.9μs 0.97 groupby.GroupByMethods.time_dtype_as_group('float', 'mean', 'transformation') - 429±3μs 416±1μs 0.97 groupby.GroupByMethods.time_dtype_as_field('int', 'sum', 'transformation') - 527±4μs 512±0.7μs 0.97 groupby.GroupByMethods.time_dtype_as_group('int', 'nunique', 'transformation') - 224±3μs 218±0.6μs 0.97 groupby.GroupByMethods.time_dtype_as_group('object', 'ffill', 'direct') - 1.22±0.01ms 1.19±0ms 0.97 groupby.GroupByMethods.time_dtype_as_field('float', 'pct_change', 'direct') - 256±2μs 248±2μs 0.97 groupby.GroupByMethods.time_dtype_as_field('float', 'min', 'direct') - 294±0.7μs 285±1μs 0.97 groupby.GroupByMethods.time_dtype_as_group('float', 'var', 'transformation') - 977±2μs 948±1μs 0.97 groupby.GroupByMethods.time_dtype_as_group('datetime', 'value_counts', 'transformation') - 30.3±0.1ms 29.4±0.2ms 0.97 groupby.AggEngine.time_dataframe_cython(False) - 405±4μs 393±2μs 0.97 groupby.GroupByMethods.time_dtype_as_field('int', 'mean', 'transformation') - 402±3μs 390±0.4μs 0.97 groupby.GroupByMethods.time_dtype_as_group('float', 'cumprod', 'transformation') - 443±3μs 429±2μs 0.97 groupby.GroupByMethods.time_dtype_as_group('float', 'prod', 'transformation') - 429±4μs 416±0.7μs 0.97 groupby.GroupByMethods.time_dtype_as_group('int', 'prod', 'transformation') - 454±2μs 440±1μs 0.97 groupby.GroupByMethods.time_dtype_as_group('float', 'median', 'direct') - 229±2μs 221±0.9μs 0.97 groupby.GroupByMethods.time_dtype_as_group('datetime', 'head', 'transformation') - 285±3μs 275±1μs 0.97 groupby.GroupByMethods.time_dtype_as_group('int', 'last', 'direct') - 370±3μs 358±1μs 0.97 groupby.GroupByMethods.time_dtype_as_group('object', 'nunique', 'transformation') - 253±3μs 245±2μs 0.97 groupby.GroupByMethods.time_dtype_as_field('int', 'cummax', 'direct') - 366±0.8μs 353±0.8μs 0.97 groupby.GroupByMethods.time_dtype_as_field('object', 'last', 'direct') - 296±3μs 286±0.8μs 0.97 groupby.GroupByMethods.time_dtype_as_group('int', 'min', 'transformation') - 365±0.9μs 353±0.5μs 0.97 groupby.GroupByMethods.time_dtype_as_field('object', 'last', 'transformation') - 296±1μs 286±2μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'var', 'direct') - 564±2μs 544±1μs 0.96 groupby.GroupByMethods.time_dtype_as_group('datetime', 'quantile', 'direct') - 316±3μs 304±2μs 0.96 groupby.GroupByMethods.time_dtype_as_group('int', 'var', 'transformation') - 415±3μs 400±0.9μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'prod', 'direct') - 986±10μs 950±3μs 0.96 groupby.GroupByMethods.time_dtype_as_group('datetime', 'value_counts', 'direct') - 441±6μs 424±2μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'mean', 'direct') - 296±4μs 285±1μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'min', 'transformation') - 297±3μs 286±0.9μs 0.96 groupby.GroupByMethods.time_dtype_as_field('object', 'shift', 'direct') - 298±2μs 287±1μs 0.96 groupby.GroupByMethods.time_dtype_as_field('object', 'shift', 'transformation') - 294±2μs 283±1μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'last', 'transformation') - 293±2μs 281±2μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'var', 'transformation') - 296±3μs 285±2μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'min', 'direct') - 293±3μs 282±0.7μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'max', 'transformation') - 224±0.9μs 215±1μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'shift', 'transformation') - 226±3μs 217±1μs 0.96 groupby.GroupByMethods.time_dtype_as_group('object', 'bfill', 'transformation') - 271±3μs 260±2μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'cumcount', 'direct') - 253±2μs 243±1μs 0.96 groupby.GroupByMethods.time_dtype_as_group('int', 'cumsum', 'direct') - 295±1μs 284±1μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'min', 'direct') - 297±3μs 286±2μs 0.96 groupby.GroupByMethods.time_dtype_as_group('int', 'min', 'direct') - 270±2μs 259±1μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'cumcount', 'transformation') - 286±4μs 275±2μs 0.96 groupby.GroupByMethods.time_dtype_as_group('int', 'last', 'transformation') - 272±3μs 261±0.8μs 0.96 groupby.GroupByMethods.time_dtype_as_group('object', 'last', 'direct') - 226±3μs 217±0.7μs 0.96 groupby.GroupByMethods.time_dtype_as_group('object', 'bfill', 'direct') - 4.29±0.01ms 4.12±0.01ms 0.96 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'std') - 299±2μs 287±1μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'first', 'direct') - 270±1μs 260±0.9μs 0.96 groupby.GroupByMethods.time_dtype_as_field('datetime', 'cumcount', 'direct') - 246±1μs 236±1μs 0.96 groupby.GroupByMethods.time_dtype_as_field('float', 'last', 'direct') - 224±0.8μs 215±0.6μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'shift', 'direct') - 4.32±0.02ms 4.15±0.03ms 0.96 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'skew') - 259±2μs 249±2μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'cumsum', 'direct') - 567±4μs 544±3μs 0.96 groupby.GroupByMethods.time_dtype_as_group('datetime', 'quantile', 'transformation') - 257±1μs 247±0.6μs 0.96 groupby.GroupByMethods.time_dtype_as_field('object', 'count', 'transformation') - 202±1μs 194±1μs 0.96 groupby.GroupByMethods.time_dtype_as_group('int', 'std', 'direct') - 270±1μs 259±0.9μs 0.96 groupby.GroupByMethods.time_dtype_as_field('datetime', 'cumcount', 'transformation') - 298±2μs 285±2μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'max', 'direct') - 271±2μs 259±0.8μs 0.96 groupby.GroupByMethods.time_dtype_as_field('object', 'cumcount', 'direct') - 243±2μs 233±1μs 0.96 groupby.GroupByMethods.time_dtype_as_field('float', 'prod', 'transformation') - 282±3μs 270±2μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'last', 'direct') - 277±1μs 265±2μs 0.96 groupby.GroupByMethods.time_dtype_as_field('float', 'median', 'direct') - 8.42±0.03ms 8.06±0.01ms 0.96 groupby.CountMultiInt.time_multi_int_nunique - 163±0.8μs 156±0.7μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'all', 'direct') - 4.49±0.01ms 4.30±0.02ms 0.96 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'count') - 219±0.9μs 210±2μs 0.96 groupby.GroupByMethods.time_dtype_as_field('float', 'cumsum', 'transformation') - 238±3μs 227±2μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'shift', 'transformation') - 590±0.6μs 565±0.7μs 0.96 groupby.GroupByMethods.time_dtype_as_group('datetime', 'rank', 'transformation') - 241±2μs 230±0.5μs 0.96 groupby.GroupByMethods.time_dtype_as_group('int', 'shift', 'direct') - 215±1μs 205±0.8μs 0.96 groupby.GroupByMethods.time_dtype_as_field('float', 'shift', 'transformation') - 248±1μs 237±2μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'cummax', 'direct') - 4.30±0.02ms 4.12±0.02ms 0.96 rolling.ExpandingMethods.time_expanding_groupby('Series', 'int', 'skew') - 493±2μs 471±1μs 0.96 groupby.GroupByMethods.time_dtype_as_group('datetime', 'nunique', 'transformation') - 251±3μs 240±0.9μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'cummin', 'direct') - 450±3μs 430±3μs 0.96 groupby.GroupByMethods.time_dtype_as_group('float', 'prod', 'direct') - 259±3μs 247±2μs 0.96 groupby.GroupByMethods.time_dtype_as_field('float', 'max', 'transformation') - 271±1μs 259±1μs 0.96 groupby.GroupByMethods.time_dtype_as_field('float', 'cumcount', 'direct') - 4.27±0.02ms 4.08±0.03ms 0.96 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'sum') - 237±2μs 226±1μs 0.96 groupby.GroupByMethods.time_dtype_as_field('int', 'shift', 'direct') - 212±0.5μs 202±1μs 0.96 groupby.GroupByMethods.time_dtype_as_field('float', 'cumprod', 'direct') - 328±1μs 313±1μs 0.96 groupby.GroupByMethods.time_dtype_as_field('datetime', 'min', 'direct') - 246±3μs 235±1μs 0.96 groupby.GroupByMethods.time_dtype_as_field('float', 'last', 'transformation') - 275±1μs 262±1μs 0.95 groupby.GroupByMethods.time_dtype_as_group('int', 'cumcount', 'direct') - 592±2μs 565±1μs 0.95 groupby.GroupByMethods.time_dtype_as_group('datetime', 'rank', 'direct') - 247±2μs 236±0.9μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'cummax', 'transformation') - 297±2μs 284±0.5μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'min', 'transformation') - 4.23±0.01ms 4.04±0.02ms 0.95 rolling.ExpandingMethods.time_expanding_groupby('Series', 'int', 'min') - 4.27±0.04ms 4.07±0.01ms 0.95 rolling.ExpandingMethods.time_expanding_groupby('Series', 'int', 'sum') - 164±0.5μs 156±1μs 0.95 groupby.GroupByMethods.time_dtype_as_group('int', 'all', 'direct') - 469±3μs 448±2μs 0.95 groupby.GroupByMethods.time_dtype_as_group('object', 'unique', 'direct') - 272±1μs 260±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('object', 'cumcount', 'transformation') - 256±2μs 245±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'first', 'direct') - 4.33±0.02ms 4.13±0.02ms 0.95 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'median') - 164±0.8μs 156±0.8μs 0.95 groupby.GroupByMethods.time_dtype_as_group('int', 'any', 'direct') - 4.29±0.01ms 4.09±0.04ms 0.95 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'mean') - 299±2μs 285±1μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'max', 'transformation') - 223±0.5μs 212±0.9μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'var', 'direct') - 215±0.6μs 205±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'shift', 'direct') - 495±1μs 472±0.7μs 0.95 groupby.GroupByMethods.time_dtype_as_group('datetime', 'nunique', 'direct') - 666±2μs 634±2μs 0.95 groupby.GroupByMethods.time_dtype_as_group('int', 'sem', 'direct') - 672±2μs 641±3μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'sem', 'direct') - 250±1μs 238±0.3μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'sum', 'transformation') - 245±0.6μs 234±0.7μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'cummin', 'transformation') - 670±2μs 638±2μs 0.95 groupby.GroupByMethods.time_dtype_as_group('int', 'sem', 'transformation') - 245±2μs 234±0.8μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'cummin', 'direct') - 250±1μs 238±2μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'sum', 'direct') - 3.17±0.02ms 3.02±0.03ms 0.95 groupby.CountMultiInt.time_multi_int_count - 219±1μs 208±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'cumsum', 'direct') - 163±2μs 156±0.8μs 0.95 groupby.GroupByMethods.time_dtype_as_group('int', 'any', 'transformation') - 4.25±0.02ms 4.05±0.02ms 0.95 rolling.ExpandingMethods.time_expanding_groupby('Series', 'int', 'max') - 244±1μs 232±2μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'prod', 'direct') - 673±2μs 640±0.9μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'sem', 'transformation') - 246±1μs 234±1μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'cumcount', 'direct') - 162±0.6μs 154±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('int', 'all', 'transformation') - 329±0.7μs 313±2μs 0.95 groupby.GroupByMethods.time_dtype_as_field('datetime', 'first', 'direct') - 162±1μs 154±2μs 0.95 groupby.GroupByMethods.time_dtype_as_field('int', 'any', 'direct') - 329±1μs 313±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('datetime', 'first', 'transformation') - 246±0.6μs 234±2μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'cumcount', 'transformation') - 166±0.6μs 157±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'all', 'direct') - 260±2μs 247±2μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'cumsum', 'transformation') - 4.24±0.02ms 4.03±0.03ms 0.95 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'max') - 211±0.8μs 200±0.4μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'cumprod', 'transformation') - 226±0.6μs 214±0.7μs 0.95 groupby.GroupByMethods.time_dtype_as_group('datetime', 'bfill', 'transformation') - 317±2μs 301±2μs 0.95 groupby.GroupByMethods.time_dtype_as_field('datetime', 'max', 'transformation') - 235±2μs 223±0.6μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'mean', 'direct') - 162±3μs 153±0.6μs 0.95 groupby.GroupByMethods.time_dtype_as_field('int', 'any', 'transformation') - 196±2μs 186±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('int', 'std', 'direct') - 192±0.4μs 182±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'std', 'transformation') - 166±0.7μs 157±2μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'any', 'transformation') - 198±0.5μs 188±0.8μs 0.95 groupby.GroupByMethods.time_dtype_as_group('float', 'std', 'direct') - 4.24±0.02ms 4.02±0.02ms 0.95 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'int', 'min') - 318±1μs 301±0.8μs 0.95 groupby.GroupByMethods.time_dtype_as_field('datetime', 'max', 'direct') - 127±3μs 120±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('float', 'count', 'direct') - 328±2μs 310±1μs 0.95 groupby.GroupByMethods.time_dtype_as_field('datetime', 'min', 'transformation') - 228±1μs 215±1μs 0.95 groupby.GroupByMethods.time_dtype_as_group('object', 'cumcount', 'direct') - 239±0.8μs 226±1μs 0.95 groupby.GroupByMethods.time_dtype_as_group('datetime', 'cumcount', 'transformation') - 205±1μs 193±0.7μs 0.95 groupby.GroupByMethods.time_dtype_as_group('int', 'std', 'transformation') - 314±2μs 297±0.9μs 0.94 groupby.GroupByMethods.time_dtype_as_field('datetime', 'last', 'transformation') - 217±1μs 205±0.5μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'cummax', 'transformation') - 842±3μs 796±5μs 0.94 groupby.Datelike.time_sum('date_range') - 218±1μs 206±0.6μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'cummin', 'transformation') - 218±2μs 206±0.6μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'cummin', 'direct') - 226±0.4μs 213±0.4μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'ffill', 'transformation') - 295±2μs 279±2μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'first', 'direct') - 126±3μs 119±2μs 0.94 groupby.GroupByMethods.time_dtype_as_group('object', 'count', 'direct') - 286±2μs 270±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('datetime', 'cummin', 'direct') - 227±2μs 215±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('object', 'cumcount', 'transformation') - 226±0.7μs 214±0.3μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'ffill', 'direct') - 239±1μs 226±0.9μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'cumcount', 'direct') - 225±1μs 213±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'bfill', 'direct') - 5.70±0.2ms 5.38±0.03ms 0.94 rolling.GroupbyEWM.time_groupby_method('cov') - 295±1μs 278±2μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'min', 'direct') - 162±0.5μs 153±0.9μs 0.94 groupby.GroupByMethods.time_dtype_as_group('object', 'all', 'transformation') - 198±1μs 187±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('float', 'std', 'transformation') - 241±0.9μs 227±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'cummin', 'transformation') - 193±0.8μs 182±0.8μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'std', 'direct') - 659±2μs 621±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('int', 'sem', 'direct') - 168±1μs 158±0.6μs 0.94 groupby.GroupByMethods.time_dtype_as_group('float', 'any', 'direct') - 198±1μs 186±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('int', 'std', 'transformation') - 287±3μs 270±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('datetime', 'cummin', 'transformation') - 219±0.8μs 206±0.2μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'shift', 'direct') - 167±0.5μs 158±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('float', 'any', 'transformation') - 161±1μs 151±2μs 0.94 groupby.GroupByMethods.time_dtype_as_group('object', 'all', 'direct') - 2.88±0.01ms 2.71±0.02ms 0.94 rolling.GroupbyEWMEngine.time_groupby_mean('cython') - 131±2μs 123±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('int', 'size', 'transformation') - 161±1μs 152±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('object', 'any', 'transformation') - 41.2±0.7ms 38.7±0.6ms 0.94 groupby.Nth.time_series_nth_any('datetime') - 40.8±0.3ms 38.3±0.5ms 0.94 groupby.Nth.time_series_nth_all('datetime') - 315±1μs 296±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('datetime', 'last', 'direct') - 131±1μs 123±2μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'size', 'transformation') - 165±0.6μs 155±0.7μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'any', 'direct') - 218±0.8μs 205±0.8μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'cummax', 'direct') - 296±2μs 278±3μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'min', 'transformation') - 292±3μs 274±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'last', 'transformation') - 166±0.8μs 156±0.9μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'all', 'transformation') - 660±4μs 619±3μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'sem', 'transformation') - 168±0.9μs 158±0.5μs 0.94 groupby.GroupByMethods.time_dtype_as_group('float', 'all', 'direct') - 24.0±0.07ms 22.5±0.2ms 0.94 groupby.AggFunctions.time_different_python_functions_multicol - 131±2μs 123±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('object', 'size', 'direct') - 135±1μs 127±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('float', 'size', 'transformation') - 666±10μs 625±2μs 0.94 groupby.GroupByMethods.time_dtype_as_field('int', 'sem', 'transformation') - 263±2μs 247±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'max', 'direct') - 168±1μs 157±0.7μs 0.94 groupby.GroupByMethods.time_dtype_as_group('float', 'all', 'transformation') - 295±1μs 276±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'max', 'direct') - 297±4μs 278±2μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'first', 'transformation') - 292±2μs 273±0.9μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'last', 'direct') - 161±1μs 151±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('object', 'any', 'direct') - 135±2μs 127±2μs 0.94 groupby.GroupByMethods.time_dtype_as_group('float', 'size', 'direct') - 131±2μs 123±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('object', 'size', 'transformation') - 132±2μs 124±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('datetime', 'size', 'direct') - 813±3μs 761±2μs 0.94 groupby.Datelike.time_sum('date_range_tz') - 131±2μs 123±1μs 0.94 groupby.GroupByMethods.time_dtype_as_field('float', 'size', 'direct') - 129±3μs 120±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('int', 'count', 'direct') - 296±3μs 277±1μs 0.94 groupby.GroupByMethods.time_dtype_as_group('datetime', 'max', 'transformation') - 130±2μs 122±0.7μs 0.94 groupby.GroupByMethods.time_dtype_as_group('float', 'count', 'transformation') - 132±2μs 123±1μs 0.93 groupby.GroupByMethods.time_dtype_as_field('int', 'size', 'direct') - 132±2μs 123±1μs 0.93 groupby.GroupByMethods.time_dtype_as_group('object', 'size', 'direct') - 4.38±0.01ms 4.09±0.04ms 0.93 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'sum') - 134±2μs 125±2μs 0.93 groupby.GroupByMethods.time_dtype_as_group('int', 'size', 'transformation') - 2.89±0.03ms 2.69±0.01ms 0.93 rolling.GroupbyEWM.time_groupby_method('var') - 243±2μs 226±1μs 0.93 groupby.GroupByMethods.time_dtype_as_group('datetime', 'cummin', 'direct') - 128±3μs 119±2μs 0.93 groupby.GroupByMethods.time_dtype_as_field('int', 'count', 'direct') - 4.47±0.01ms 4.15±0.02ms 0.93 rolling.ExpandingMethods.time_expanding_groupby('Series', 'float', 'median') - 132±2μs 122±2μs 0.93 groupby.GroupByMethods.time_dtype_as_group('object', 'size', 'transformation') - 1.35±0.01ms 1.25±0.01ms 0.93 groupby.RankWithTies.time_rank_ties('float32', 'max') - 4.40±0.03ms 4.09±0.03ms 0.93 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'min') - 4.39±0.01ms 4.07±0.02ms 0.93 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'max') - 1.35±0.01ms 1.26±0.01ms 0.93 groupby.RankWithTies.time_rank_ties('float32', 'dense') - 4.42±0.01ms 4.09±0.02ms 0.92 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'mean') - 167±0.5μs 155±0.6μs 0.92 groupby.GroupByMethods.time_dtype_as_group('datetime', 'any', 'transformation') - 4.48±0.06ms 4.14±0.03ms 0.92 rolling.ExpandingMethods.time_expanding_groupby('Series', 'float', 'skew') - 4.47±0.03ms 4.13±0.01ms 0.92 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'skew') - 168±2μs 156±1μs 0.92 groupby.GroupByMethods.time_dtype_as_group('datetime', 'all', 'direct') - 4.38±0.01ms 4.05±0.02ms 0.92 rolling.ExpandingMethods.time_expanding_groupby('Series', 'float', 'max') - 1.34±0.01ms 1.23±0.01ms 0.92 groupby.RankWithTies.time_rank_ties('float64', 'dense') - 114±1ms 106±0.2ms 0.92 groupby.GroupByMethods.time_dtype_as_field('datetime', 'unique', 'transformation') - 115±1ms 106±0.3ms 0.92 groupby.GroupByMethods.time_dtype_as_field('datetime', 'unique', 'direct') - 167±0.7μs 154±0.9μs 0.92 groupby.GroupByMethods.time_dtype_as_group('datetime', 'any', 'direct') - 4.48±0.03ms 4.13±0.01ms 0.92 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'std') - 4.41±0.01ms 4.06±0.03ms 0.92 rolling.ExpandingMethods.time_expanding_groupby('Series', 'float', 'sum') - 4.48±0.02ms 4.13±0.02ms 0.92 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'median') - 277±2ms 255±2ms 0.92 groupby.GroupByCythonAgg.time_frame_agg('float64', 'median') - 1.34±0ms 1.23±0ms 0.92 groupby.RankWithTies.time_rank_ties('float64', 'max') - 1.39±0.01ms 1.28±0.01ms 0.92 groupby.RankWithTies.time_rank_ties('int64', 'first') - 4.38±0.03ms 4.02±0.01ms 0.92 rolling.ExpandingMethods.time_expanding_groupby('Series', 'float', 'min') - 1.36±0ms 1.25±0ms 0.92 groupby.RankWithTies.time_rank_ties('float32', 'min') - 1.36±0.01ms 1.24±0ms 0.92 groupby.RankWithTies.time_rank_ties('float32', 'first') - 169±0.7μs 155±2μs 0.92 groupby.GroupByMethods.time_dtype_as_group('datetime', 'all', 'transformation') - 1.34±0.01ms 1.23±0ms 0.92 groupby.RankWithTies.time_rank_ties('float64', 'average') - 1.34±0.01ms 1.23±0.01ms 0.92 groupby.RankWithTies.time_rank_ties('float64', 'min') - 1.38±0ms 1.26±0.01ms 0.92 groupby.RankWithTies.time_rank_ties('int64', 'min') - 1.36±0.01ms 1.25±0ms 0.92 groupby.RankWithTies.time_rank_ties('float32', 'average') - 1.39±0.01ms 1.27±0ms 0.92 groupby.RankWithTies.time_rank_ties('int64', 'dense') - 1.38±0.01ms 1.26±0ms 0.92 groupby.RankWithTies.time_rank_ties('int64', 'average') - 6.15±0.2ms 5.61±0.03ms 0.91 rolling.GroupbyEWM.time_groupby_method('corr') - 1.39±0.02ms 1.26±0ms 0.91 groupby.RankWithTies.time_rank_ties('int64', 'max') - 33.4±0.03ms 30.4±0.08ms 0.91 rolling.Pairwise.time_groupby(10, 'cov', False) - 131±2μs 120±1μs 0.91 groupby.GroupByMethods.time_dtype_as_group('datetime', 'count', 'transformation') - 4.73±0.01ms 4.29±0.01ms 0.91 rolling.ExpandingMethods.time_expanding_groupby('Series', 'float', 'count') - 138±2μs 125±1μs 0.91 groupby.GroupByMethods.time_dtype_as_group('datetime', 'size', 'transformation') - 33.5±0.2ms 30.4±0.1ms 0.91 rolling.Pairwise.time_groupby(1000, 'cov', False) - 4.75±0.01ms 4.30±0.03ms 0.91 rolling.ExpandingMethods.time_expanding_groupby('DataFrame', 'float', 'count') - 33.9±0.2ms 30.7±0.09ms 0.91 rolling.Pairwise.time_groupby(10, 'corr', False) - 33.9±0.2ms 30.6±0.1ms 0.90 rolling.Pairwise.time_groupby(1000, 'corr', False) - 534±2μs 482±2μs 0.90 groupby.GroupByMethods.time_dtype_as_field('datetime', 'rank', 'direct') - 31.7±0.3ms 28.6±0.1ms 0.90 rolling.Pairwise.time_groupby(None, 'cov', False) - 535±3μs 483±2μs 0.90 groupby.GroupByMethods.time_dtype_as_field('datetime', 'rank', 'transformation') - 138±2μs 125±1μs 0.90 groupby.GroupByMethods.time_dtype_as_group('datetime', 'size', 'direct') - 32.2±0.1ms 28.9±0.07ms 0.90 rolling.Pairwise.time_groupby(None, 'corr', False) - 1.78±0.01s 1.60±0s 0.90 groupby.GroupByMethods.time_dtype_as_group('int', 'describe', 'transformation') - 1.12±0ms 1.01±0.01ms 0.90 groupby.SumMultiLevel.time_groupby_sum_multiindex - 1.21±0.01s 1.09±0s 0.90 groupby.GroupByMethods.time_dtype_as_field('int', 'describe', 'direct') - 2.78±0.01s 2.48±0.01s 0.89 groupby.GroupByMethods.time_dtype_as_group('float', 'describe', 'transformation') - 1.79±0.01s 1.60±0s 0.89 groupby.GroupByMethods.time_dtype_as_group('int', 'describe', 'direct') - 1.47±0.01ms 1.31±0.01ms 0.89 groupby.RankWithTies.time_rank_ties('datetime64', 'dense') - 135±3μs 120±2μs 0.89 groupby.GroupByMethods.time_dtype_as_group('datetime', 'count', 'direct') - 1.23±0.01s 1.10±0s 0.89 groupby.GroupByMethods.time_dtype_as_field('float', 'describe', 'direct') - 1.21±0.01s 1.08±0s 0.89 groupby.GroupByMethods.time_dtype_as_field('int', 'describe', 'transformation') - 461±3μs 411±1μs 0.89 groupby.GroupByMethods.time_dtype_as_field('datetime', 'ffill', 'transformation') - 2.78±0.01s 2.48±0.01s 0.89 groupby.GroupByMethods.time_dtype_as_group('float', 'describe', 'direct') - 4.53±0.02ms 4.03±0.03ms 0.89 groupby.CountMultiDtype.time_multi_count - 461±2μs 409±1μs 0.89 groupby.GroupByMethods.time_dtype_as_field('datetime', 'bfill', 'direct') - 1.24±0.01s 1.10±0.01s 0.89 groupby.GroupByMethods.time_dtype_as_field('float', 'describe', 'transformation') - 1.47±0.01ms 1.31±0.02ms 0.89 groupby.RankWithTies.time_rank_ties('datetime64', 'first') - 1.47±0.01ms 1.30±0ms 0.89 groupby.RankWithTies.time_rank_ties('datetime64', 'average') - 1.47±0.01ms 1.30±0.01ms 0.88 groupby.RankWithTies.time_rank_ties('datetime64', 'min') - 462±0.8μs 409±0.9μs 0.88 groupby.GroupByMethods.time_dtype_as_field('datetime', 'bfill', 'transformation') - 463±0.9μs 409±0.9μs 0.88 groupby.GroupByMethods.time_dtype_as_field('datetime', 'ffill', 'direct') - 1.46±0.01ms 1.29±0.01ms 0.88 groupby.RankWithTies.time_rank_ties('datetime64', 'max') - 1.90±0.01ms 1.62±0.03ms 0.86 groupby.TransformNaN.time_first - 267±2μs 228±1μs 0.85 groupby.GroupByMethods.time_dtype_as_field('datetime', 'any', 'direct') - 269±2μs 229±2μs 0.85 groupby.GroupByMethods.time_dtype_as_field('datetime', 'all', 'direct') - 269±3μs 228±0.9μs 0.85 groupby.GroupByMethods.time_dtype_as_field('datetime', 'all', 'transformation') - 267±2μs 227±0.9μs 0.85 groupby.GroupByMethods.time_dtype_as_field('datetime', 'any', 'transformation') - 2.41±0.01ms 2.00±0.02ms 0.83 groupby.FillNA.time_df_ffill - 418±3ms 346±2ms 0.83 groupby.GroupByMethods.time_dtype_as_field('int', 'mad', 'transformation') - 418±3ms 346±3ms 0.83 groupby.GroupByMethods.time_dtype_as_field('int', 'mad', 'direct') - 616±3ms 510±4ms 0.83 groupby.GroupByMethods.time_dtype_as_group('int', 'mad', 'transformation') - 962±2ms 797±7ms 0.83 groupby.GroupByMethods.time_dtype_as_group('float', 'mad', 'transformation') - 414±2ms 342±3ms 0.83 groupby.GroupByMethods.time_dtype_as_field('float', 'mad', 'direct') - 1.33±0.02ms 1.10±0ms 0.83 groupby.SumBools.time_groupby_sum_booleans - 964±5ms 796±7ms 0.83 groupby.GroupByMethods.time_dtype_as_group('float', 'mad', 'direct') - 619±2ms 510±5ms 0.82 groupby.GroupByMethods.time_dtype_as_group('int', 'mad', 'direct') - 417±2ms 342±4ms 0.82 groupby.GroupByMethods.time_dtype_as_field('float', 'mad', 'transformation') - 2.43±0.01ms 1.99±0.01ms 0.82 groupby.FillNA.time_df_bfill - 309±0.9μs 253±0.9μs 0.82 groupby.GroupByMethods.time_dtype_as_field('datetime', 'shift', 'transformation') - 310±1μs 253±1μs 0.81 groupby.GroupByMethods.time_dtype_as_field('datetime', 'shift', 'direct') - 96.1±0.8ms 77.7±0.8ms 0.81 groupby.TransformEngine.time_dataframe_cython(False) - 96.6±1ms 77.4±1ms 0.80 groupby.TransformEngine.time_dataframe_cython(True) - 579±2μs 460±5μs 0.80 groupby.GroupManyLabels.time_sum(1) - 195±0.7μs 150±1μs 0.77 groupby.GroupByMethods.time_dtype_as_field('datetime', 'count', 'transformation') - 195±2μs 150±0.4μs 0.77 groupby.GroupByMethods.time_dtype_as_field('datetime', 'count', 'direct') - 6.89±0.9ms 4.86±0.06ms 0.70 groupby.Transform.time_transform_ufunc_max - 66.6±0.3ms 46.8±0.2ms 0.70 groupby.Apply.time_copy_function_multi_col(5) - 959±6μs 674±2μs 0.70 groupby.GroupByMethods.time_dtype_as_field('datetime', 'value_counts', 'transformation') - 961±7μs 673±3μs 0.70 groupby.GroupByMethods.time_dtype_as_field('datetime', 'value_counts', 'direct') - 546±5μs 326±0.5μs 0.60 groupby.GroupByMethods.time_dtype_as_field('datetime', 'nunique', 'transformation') - 545±6μs 326±1μs 0.60 groupby.GroupByMethods.time_dtype_as_field('datetime', 'nunique', 'direct') - 608±0.4ms 172±0.8ms 0.28 groupby.Apply.time_copy_overhead_single_col(4) - 1.86±0.01s 455±2ms 0.24 groupby.Apply.time_copy_function_multi_col(4) ```

A few things are significantly slower:

The benchmarks that are affected by not using the "fast_apply" from libreduction (groupby.Apply.time_scalar_function_single_col(4) and groupby.Apply.time_scalar_function_multi_col(4)) are 2-3x slower. This would be fixed by https://github.com/pandas-dev/pandas/pull/40171 (supporting ArrayManager in libreduction for dataframes), but so that's pending a decision on getting rid of libreduction alltogether or not.
The groupby.GroupManyLabels.time_sum(1000) case is 16x slower. This is a case of a small but wide dataframe (shape of 1000x1000), and it's per-column overhead of checking dtypes etc in BaseGrouper._cython_operation. https://github.com/pandas-dev/pandas/pull/40317 already improves this a little bit, but there is certainly still room to further cut down this overhead by using more specialized dtype checks.

What's faster:

groupby.Apply.time_copy_overhead_single_col(4) is considerably faster (4x), but that might actually an (existing) bug in inconsistency between libreduction fast_apply and the fallback python apply. As with AM (taking the python fallback), it seems to infer it's doing a transform (same indexed result) and thus not reordering the result / not prepending the index with the groupby keys. Will open an issue about this -> https://github.com/pandas-dev/pandas/issues/40446
Some of the GroupByMethods benchmarks are slightly faster with ArrayManager, which from a quick look seems to stem from a smaller overhead in _get_numeric_data, but I think this overhead for BlockManager is also only visible because of the relatively small size of the data.

jorisvandenbossche commented 3 years ago

Reductions (stat_ops)

$ asv continuous -f 1.01 -b stat_ops HEAD~1 HEAD

```diff before after ratio [c9628cbd] [707df345]

+ 11.9±0.05ms 425±5ms 35.75 stat_ops.Correlation.time_corrwith_rows('pearson') + 1.78±0.03ms 6.63±0.1ms 3.72 stat_ops.FrameOps.time_op('mean', 'float', 1) + 1.76±0.03ms 6.41±0.06ms 3.64 stat_ops.FrameOps.time_op('sum', 'float', 1) + 6.51±0.3ms 17.9±0.2ms 2.74 stat_ops.Correlation.time_corrwith_cols('pearson') + 6.28±0.2ms 16.9±0.4ms 2.68 stat_ops.FrameOps.time_op('mad', 'float', 1) + 3.35±0.06ms 6.07±0.3ms 1.81 stat_ops.FrameMultiIndexOps.time_op(0, 'var') + 3.46±0.07ms 5.97±0.08ms 1.72 stat_ops.FrameMultiIndexOps.time_op(1, 'var') + 3.09±0.05ms 5.23±0.09ms 1.69 stat_ops.FrameMultiIndexOps.time_op(0, 'sum') + 3.12±0.07ms 5.18±0.1ms 1.66 stat_ops.FrameMultiIndexOps.time_op(0, 'mean') + 3.16±0.04ms 5.12±0.04ms 1.62 stat_ops.FrameMultiIndexOps.time_op(1, 'sum') + 3.19±0.1ms 5.13±0.09ms 1.61 stat_ops.FrameMultiIndexOps.time_op(1, 'mean') + 1.82±0.03ms 2.89±0.02ms 1.59 stat_ops.FrameOps.time_op('sum', 'int', 1) + 4.20±0.08ms 6.44±0.1ms 1.54 stat_ops.FrameOps.time_op('prod', 'float', 1) + 2.06±0.04ms 3.16±0.02ms 1.53 stat_ops.FrameOps.time_op('prod', 'int', 1) + 639±5ms 927±7ms 1.45 stat_ops.FrameMultiIndexOps.time_op([0, 1], 'skew') + 6.41±0.3ms 8.93±0.1ms 1.39 stat_ops.FrameMultiIndexOps.time_op([0, 1], 'var') + 75.5±1ms 104±2ms 1.37 stat_ops.FrameMultiIndexOps.time_op(1, 'skew') + 3.17±0.07ms 4.30±0.05ms 1.36 stat_ops.FrameOps.time_op('mean', 'int', 1) + 6.10±0.3ms 8.09±0.07ms 1.32 stat_ops.FrameMultiIndexOps.time_op([0, 1], 'mean') + 1.48±0.01s 1.92±0.02s 1.30 stat_ops.FrameMultiIndexOps.time_op([0, 1], 'mad') + 2.76±0.05ms 3.54±0.05ms 1.28 stat_ops.FrameMultiIndexOps.time_op(1, 'prod') + 2.88±0.05ms 3.63±0.02ms 1.26 stat_ops.FrameMultiIndexOps.time_op(0, 'prod') + 168±3ms 205±3ms 1.22 stat_ops.FrameMultiIndexOps.time_op(1, 'mad') + 12.0±0.8ms 14.6±0.2ms 1.21 stat_ops.FrameOps.time_op('mad', 'int', 1) + 519±6μs 613±10μs 1.18 stat_ops.Correlation.time_corr('pearson') + 14.9±0.09ms 17.5±0.1ms 1.18 stat_ops.Correlation.time_corr_wide_nans('pearson') + 1.57±0.02ms 1.79±0.02ms 1.15 stat_ops.FrameOps.time_op('mean', 'float', 0) + 6.95±0.2ms 7.95±0.04ms 1.14 stat_ops.FrameOps.time_op('median', 'float', 0) + 12.4±0.05ms 12.7±0.07ms 1.02 stat_ops.Rank.time_average_old('DataFrame', False) - 6.57±0.04ms 6.23±0.06ms 0.95 stat_ops.FrameMultiIndexOps.time_op(0, 'std') - 1.25±0.03ms 1.18±0.01ms 0.94 stat_ops.FrameOps.time_op('sum', 'Int64', 0) - 6.72±0.3ms 6.23±0.1ms 0.93 stat_ops.FrameMultiIndexOps.time_op(1, 'std') - 39.2±1ms 36.3±0.2ms 0.93 stat_ops.FrameOps.time_op('prod', 'Int64', 1) - 13.4±0.5ms 12.3±0.05ms 0.92 stat_ops.FrameMultiIndexOps.time_op([0, 1], 'sem') - 41.8±0.7ms 38.3±0.7ms 0.92 stat_ops.FrameOps.time_op('mean', 'Int64', 1) - 10.2±0.2ms 9.30±0.06ms 0.91 stat_ops.FrameMultiIndexOps.time_op(0, 'sem') - 69.0±1ms 55.1±3ms 0.80 stat_ops.SeriesMultiIndexOps.time_op(1, 'mad') - 2.39±0.2ms 1.86±0.04ms 0.78 stat_ops.FrameOps.time_op('prod', 'float', 0) - 613±6ms 477±6ms 0.78 stat_ops.SeriesMultiIndexOps.time_op([0, 1], 'mad') - 13.2±0.5ms 9.90±0.1ms 0.75 stat_ops.SeriesMultiIndexOps.time_op(0, 'mad') - 1.56±0.03ms 1.15±0.01ms 0.74 stat_ops.FrameOps.time_op('sum', 'float', 0) - 6.60±1ms 3.95±0.05ms 0.60 stat_ops.FrameOps.time_op('std', 'float', 0) - 6.59±1ms 3.81±0.04ms 0.58 stat_ops.FrameOps.time_op('var', 'float', 0) - 9.73±0.3ms 5.63±0.07ms 0.58 stat_ops.FrameOps.time_op('mad', 'int', 0) - 9.11±0.7ms 4.85±0.4ms 0.53 stat_ops.FrameOps.time_op('skew', 'float', 0) - 9.03±0.6ms 4.64±0.06ms 0.51 stat_ops.FrameOps.time_op('kurt', 'float', 0) - 2.92±0.01ms 1.34±0.01ms 0.46 stat_ops.SeriesOps.time_op('mad', 'float') - 2.19±0.02ms 962±40μs 0.44 stat_ops.FrameOps.time_op('sum', 'int', 0) - 2.80±0.04ms 1.22±0.02ms 0.44 stat_ops.SeriesOps.time_op('mad', 'int') - 2.42±0.02ms 1.05±0.01ms 0.43 stat_ops.FrameOps.time_op('prod', 'int', 0) - 14.9±0.5ms 6.13±0.04ms 0.41 stat_ops.FrameOps.time_op('sem', 'int', 0) - 5.90±0.02ms 2.36±0.04ms 0.40 stat_ops.FrameOps.time_op('std', 'int', 0) - 5.96±0.1ms 2.28±0.02ms 0.38 stat_ops.FrameOps.time_op('var', 'int', 0) - 2.84±0.02ms 1.06±0.01ms 0.37 stat_ops.FrameOps.time_op('mean', 'int', 0) - 11.2±0.2ms 3.48±0.1ms 0.31 stat_ops.FrameOps.time_op('skew', 'int', 0) - 11.1±0.1ms 3.11±0.07ms 0.28 stat_ops.FrameOps.time_op('kurt', 'int', 0) ```

DataFrame.corrwith(..., method="pearson", axis=1) is a lot slower:
- for "pearson" we have custom code that operates on the datafame itself (a couple of arithmetic operations) -> those have currently a higher overhead with ArrayManager
- this overhead is especially visible with axis=1, because then we transpose the dataframe, and get a (15, 500) shape
But in general, this is a tiny but wide dataframe, so where the fixed overhead is significant. If I increase the size of the dataset, both have more or less the same performance.
Reductions with axis=1 (so taking eg the mean of each row, instead of each column) are slower, as can be expected (since this needs to go through a transpose / conversion to numpy array).
The FrameMultiIndexOps are 1.2-1.8x slower. This are basically groupby benchmarks, and thus showing that the built-in groupby aggregations are slightly slower when done column-wise (but so in the future we might be able to optimize this, if we can go all-in on 1D arrays in the cython code, xref #39861)
The FrameOps with float dtypes are a bit faster with ArrayManager. From a quick look at the profiles, it seems that the handling of min_count is more efficient for 1D arrays (which have scalars as resut) as for 2D arrays (probably something that can be optimized).
The FrameOps with int dtypes are quite a bit faster with ArrayManager, but that seems to be caused by a suboptimal memory layout for the BlockManager case (so that being slower as it could be), so that might be an artefact of the setup code of the benchmark case.

jorisvandenbossche commented 3 years ago

Element-wise ops (arithmetic)

$ asv continuous -f 1.01 -b arithmetic HEAD~1 HEAD

```diff before after ratio [cc1802b8] [42274f9e]

+ 450±4μs 5.59±0.08ms 12.42 arithmetic.Ops2.time_frame_series_dot + 1.30±0.01ms 8.33±0.03ms 6.42 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.30±0.02ms 8.31±0.05ms 6.40 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.31±0.01ms 8.34±0.04ms 6.39 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.31±0.01ms 8.35±0.03ms 6.38 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.31±0.01ms 8.33±0.05ms 6.36 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.39±0.01ms 8.84±0.06ms 6.35 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.40±0.01ms 8.84±0.08ms 6.32 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.32±0.01ms 8.30±0.07ms 6.31 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 45.1±0.06ms 277±1ms 6.14 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (1000, 10000)) + 1.48±0.01ms 8.99±0.04ms 6.07 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.48±0ms 8.91±0.08ms 6.04 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.39±0.01ms 8.37±0.06ms 6.03 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.48±0.01ms 8.93±0.05ms 6.02 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.40±0ms 8.31±0.06ms 5.95 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.47±0.01ms 8.52±0.05ms 5.77 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.47±0.01ms 8.50±0.03ms 5.77 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.47±0.01ms 8.45±0.07ms 5.74 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.61±0.01ms 9.13±0.05ms 5.67 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 6.14±0.03ms 33.8±0.3ms 5.50 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (1000, 10000)) + 1.73±0.02ms 9.27±0.08ms 5.35 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.61±0.01ms 8.60±0.06ms 5.34 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.74±0.02ms 9.26±0.06ms 5.33 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.73±0.01ms 9.21±0.05ms 5.32 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.74±0.02ms 9.24±0.02ms 5.31 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.74±0.03ms 9.22±0.08ms 5.29 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.74±0.02ms 9.18±0.05ms 5.29 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.81±0.02ms 9.36±0.06ms 5.18 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.81±0.01ms 9.35±0.04ms 5.17 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.75±0.01ms 8.87±0.06ms 5.08 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.84±0.02ms 9.31±0.05ms 5.07 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.85±0.01ms 9.36±0.05ms 5.05 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.75±0.01ms 8.85±0.02ms 5.05 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.91±0.01ms 9.49±0.06ms 4.98 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.79±0.02ms 8.91±0.03ms 4.98 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.91±0.01ms 9.48±0.04ms 4.97 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.79±0.02ms 8.87±0.06ms 4.95 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.92±0.02ms 9.45±0.05ms 4.93 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 18.2±0.04ms 88.8±0.3ms 4.88 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (1000, 10000)) + 1.93±0.02ms 9.43±0.07ms 4.88 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.94±0.01ms 9.42±0.07ms 4.86 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 10.1±0.04ms 49.3±0.7ms 4.86 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (1000, 10000)) + 1.95±0.01ms 9.46±0.03ms 4.85 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.86±0.01ms 8.94±0.03ms 4.79 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.87±0.02ms 8.92±0.04ms 4.76 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.89±0.02ms 9.00±0.05ms 4.76 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.88±0.01ms 8.95±0.03ms 4.76 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.02±0.02ms 9.58±0.05ms 4.75 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.91±0.01ms 9.01±0.05ms 4.73 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.91±0.01ms 8.95±0.04ms 4.69 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.06±0.01ms 9.61±0.04ms 4.66 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.01±0.01ms 9.26±0.05ms 4.60 arithmetic.Ops.time_frame_comparison(True, 'default') + 2.00±0.02ms 9.05±0.04ms 4.52 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.04±0.02ms 9.13±0.02ms 4.47 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 8.83±5ms 36.3±0.1ms 4.11 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (1000, 10000)) + 2.41±0.02ms 9.89±0.04ms 4.11 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.36±0.01ms 9.65±0.04ms 4.09 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.37±0.02ms 9.67±0.06ms 4.09 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.35±0.01ms 9.59±0.05ms 4.09 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.43±0.01ms 9.93±0.08ms 4.08 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.35±0.01ms 9.58±0.09ms 4.08 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.36±0.01ms 9.60±0.05ms 4.08 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.37±0.01ms 9.62±0.07ms 4.05 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.43±0.01ms 9.72±0.06ms 4.00 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.44±0.01ms 9.71±0.05ms 3.99 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.38±0.01ms 9.45±0.05ms 3.98 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.44±0.01ms 9.71±0.03ms 3.97 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.38±0.01ms 9.43±0.05ms 3.96 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.39±0.01ms 9.44±0.06ms 3.96 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.38±0.01ms 9.42±0.02ms 3.95 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.37±0.01ms 9.07±0.05ms 3.82 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.39±0.01ms 9.12±0.1ms 3.82 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.36±0.01ms 9.01±0.09ms 3.82 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.36±0.02ms 9.02±0.06ms 3.82 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.42±0.02ms 9.22±0.04ms 3.82 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.38±0.02ms 9.07±0.1ms 3.81 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.42±0.02ms 9.21±0.04ms 3.80 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.54±0.02ms 9.66±0.08ms 3.80 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 13.7±1ms 52.2±0.3ms 3.80 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (1000, 10000)) + 2.37±0.01ms 9.00±0.04ms 3.80 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.38±0.01ms 9.01±0.05ms 3.79 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.37±0.02ms 8.97±0.06ms 3.79 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.37±0.01ms 8.99±0.06ms 3.79 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.82±0.08ms 10.6±0.09ms 3.77 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.43±0.01ms 9.14±0.07ms 3.76 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.89±0.08ms 10.8±0.06ms 3.75 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.77±0.06ms 9.90±0.06ms 3.57 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 3.18±0.1ms 10.7±0.2ms 3.35 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 3.20±0.02ms 10.1±0.05ms 3.16 arithmetic.Ops.time_frame_add(True, 'default') + 3.21±0.02ms 10.1±0.06ms 3.15 arithmetic.Ops.time_frame_mult(True, 'default') + 2.10±0.04ms 6.27±0.03ms 2.98 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('truediv') + 1.37±0.02ms 3.49±0.02ms 2.55 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('lt') + 1.36±0.02ms 3.45±0.07ms 2.53 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('gt') + 1.38±0.01ms 3.48±0.03ms 2.52 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('le') + 1.37±0.02ms 3.42±0.06ms 2.49 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('eq') + 1.37±0.01ms 3.42±0.03ms 2.49 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('ne') + 1.65±0.02ms 4.06±0.07ms 2.47 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('mul') + 1.38±0.01ms 3.41±0.03ms 2.47 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('ge') + 1.63±0.01ms 3.93±0.09ms 2.42 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('sub') + 1.64±0.06ms 3.92±0.05ms 2.39 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('add') + 32.9±0.2ms 78.1±0.4ms 2.37 arithmetic.Ops.time_frame_multi_and(True, 'default') + 6.08±0.07ms 13.5±0.1ms 2.21 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (100000, 100)) + 2.99±0.08ms 5.99±0.04ms 2.00 arithmetic.Ops.time_frame_comparison(True, 1) + 3.29±0.01ms 6.19±0.06ms 1.88 arithmetic.Ops.time_frame_add(True, 1) + 3.30±0.07ms 6.18±0.04ms 1.88 arithmetic.Ops.time_frame_mult(True, 1) + 10.0±0.03ms 18.5±0.09ms 1.85 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (10000, 1000)) + 48.0±0.6ms 83.2±0.2ms 1.74 arithmetic.Ops2.time_frame_float_floor_by_zero + 4.62±0.2ms 8.00±0.1ms 1.73 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('lt') + 4.57±0.04ms 7.88±0.1ms 1.73 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('le') + 4.67±0.03ms 7.93±0.04ms 1.70 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('gt') + 4.62±0.3ms 7.83±0.05ms 1.70 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('ge') + 4.69±0.2ms 7.91±0.07ms 1.69 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('eq') + 10.1±0.05ms 17.1±0.07ms 1.69 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (100000, 100)) + 4.75±0.04ms 7.82±0.03ms 1.65 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('ne') + 39.3±0.8ms 64.4±0.3ms 1.64 arithmetic.Ops.time_frame_multi_and(True, 1) + 27.5±0.4ms 43.2±0.4ms 1.57 arithmetic.Ops2.time_frame_dot + 20.7±0.03ms 31.9±0.2ms 1.54 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 20.7±0.03ms 31.1±0.2ms 1.50 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 45.1±0.09ms 64.1±0.2ms 1.42 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (10000, 1000)) + 6.06±0.04ms 8.56±0.04ms 1.41 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (10000, 1000)) + 37.4±0.4ms 51.8±0.2ms 1.39 arithmetic.Ops.time_frame_multi_and(False, 1) + 37.3±0.3ms 51.5±0.2ms 1.38 arithmetic.Ops.time_frame_multi_and(False, 'default') + 41.5±0.3ms 56.9±0.6ms 1.37 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 41.3±0.07ms 56.1±0.2ms 1.36 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 41.3±0.08ms 55.9±0.3ms 1.35 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 41.9±0.09ms 56.4±0.4ms 1.35 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.02±0.05ms 2.38±0.01ms 1.18 arithmetic.Ops.time_frame_comparison(False, 'default') + 2.07±0.08ms 2.39±0.01ms 1.16 arithmetic.Ops.time_frame_comparison(False, 1) + 52.5±0.4ms 60.4±0.2ms 1.15 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('floordiv') + 32.0±0.3ms 36.5±0.04ms 1.14 arithmetic.Ops2.time_frame_float_mod + 18.1±0.1ms 20.0±0.05ms 1.10 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (10000, 1000)) + 6.02±0.06ms 6.49±0.07ms 1.08 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (1000000, 10)) + 33.1±0.09ms 34.8±0.04ms 1.05 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 32.4±0.06ms 33.9±0.1ms 1.05 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 27.3±0.05ms 28.5±0.08ms 1.04 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 27.2±0.1ms 28.4±0.03ms 1.04 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 27.0±0.1ms 27.8±0.05ms 1.03 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 29.2±0.09ms 30.0±0.1ms 1.03 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 27.2±0.03ms 27.9±0.09ms 1.03 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 44.6±0.05ms 45.3±0.1ms 1.02 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (1000000, 10)) - 1.40±0s 1.37±0s 0.98 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.77±0.01ms 1.73±0ms 0.98 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.43±0ms 1.40±0ms 0.98 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.58±0ms 1.54±0ms 0.98 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.38±0ms 1.34±0ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.02±0.01ms 987±5μs 0.97 arithmetic.NumericInferOps.time_divide() - 1.54±0ms 1.50±0ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.47±0ms 1.43±0ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.46±0.01ms 1.42±0ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.41±0ms 1.37±0.01ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 4.30±0.02ms 4.17±0.02ms 0.97 arithmetic.DateInferOps.time_timedelta_plus_datetime - 1.31±0ms 1.27±0ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.34±0.01ms 1.30±0ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.46±0ms 1.41±0ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 938±6ms 908±8ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.38±0.01ms 1.33±0ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.27±0ms 1.23±0ms 0.96 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 478±10μs 458±2μs 0.96 arithmetic.NumericInferOps.time_multiply() - 186±2μs 176±2μs 0.94 arithmetic.NumericInferOps.time_subtract() - 480±10μs 453±3μs 0.94 arithmetic.NumericInferOps.time_add() - 290±2μs 271±2μs 0.93 arithmetic.NumericInferOps.time_add() - 52.7±0.1ms 49.1±0.09ms 0.93 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) - 295±3μs 274±2μs 0.93 arithmetic.NumericInferOps.time_add() - 186±2μs 173±3μs 0.93 arithmetic.NumericInferOps.time_subtract() - 53.3±0.2ms 49.2±0.1ms 0.92 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) - 57.8±0.2ms 52.9±0.2ms 0.92 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) - 52.9±0.2ms 48.4±0.2ms 0.92 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) - 55.4±0.4ms 50.6±0.3ms 0.91 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) - 55.9±0.4ms 50.8±0.1ms 0.91 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) - 55.6±0.2ms 50.3±0.08ms 0.90 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) - 52.4±0.2ms 47.2±0.08ms 0.90 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) - 30.6±0.2ms 27.3±0.2ms 0.89 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) - 78.5±1ms 69.7±4ms 0.89 arithmetic.BinaryOpsMultiIndex.time_binary_op_multiindex('div') - 45.1±0.04ms 40.0±0.1ms 0.89 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (100000, 100)) - 106±0.6ms 89.9±0.1ms 0.84 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('pow') - 59.1±0.4ms 46.3±0.4ms 0.78 arithmetic.Ops2.time_frame_float_div - 252±1μs 194±0.6μs 0.77 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 21.8±0.4ms 16.5±0.2ms 0.76 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) - 18.1±0.08ms 12.5±0.04ms 0.69 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (100000, 100)) - 57.9±0.4ms 29.4±0.3ms 0.51 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('floordiv') - 79.6±1ms 29.8±0.2ms 0.37 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('pow') - 55.9±0.1ms 5.97±0.03ms 0.11 arithmetic.Ops2.time_frame_int_div_by_zero - 55.9±0.3ms 4.85±0.02ms 0.09 arithmetic.Ops2.time_frame_float_div_by_zero ```

The above is with a branch that combines several WIP changes (https://github.com/pandas-dev/pandas/compare/master...jorisvandenbossche:ops-refactor-combined?expand=1, for some there is already an open PR, eg #40445, #40444, #40396, #39772, for others I still need to open a PR but are dependent on other changes).

In general the element-wise ops are probably that set of operations that can see the biggest impact of performing column-by-column instead of on a single block.
A few first notes:

dot() is slower, but since that's basically a 2D array operation, that's to be expected.
It seems that specifically the comparison ops are slower column-wise when using numexpr (eg many of the IntFrameWithScalar.time_frame_op_with_scalar benchmarks in the output above with lt/gt/eq/ne/... ops). That's something I need to investigate a bit more in detail, because this slowdown doesn't seem to come from per-column overhead, but actually numexpr being slower.
The majority of the FrameWithFrameWide benchmarks actually don't show up in the above output (so meaning that they didn't give a significant difference), and the ones that do show up with the biggest slowdown are the cases with shape (1000, 10000), so wide dataframes. For wide dataframes, some slowdown will be inevitable (question is of course how much we find acceptable).

UPDATE 2021-04-01: Using latest master + https://github.com/pandas-dev/pandas/pull/40482

$ asv continuous -f 1.01 -b arithmetic HEAD~1 HEAD

```diff before after ratio [fd958262] [89c230c8]

+ 6.59±0.08ms 98.1±0.9ms 14.88 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (1000, 10000)) + 10.7±0.1ms 128±9ms 11.96 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (1000, 10000)) + 472±3μs 5.63±0.6ms 11.92 arithmetic.Ops2.time_frame_series_dot + 14.8±0.4ms 148±3ms 9.99 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (1000, 10000)) + 45.5±0.06ms 384±6ms 8.44 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (1000, 10000)) + 1.25±0.01ms 10.5±1ms 8.41 arithmetic.Ops2.time_frame_float_div_by_zero + 13.7±0.03ms 99.9±2ms 7.27 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (1000, 10000)) + 1.93±0.02ms 12.3±0.8ms 6.41 arithmetic.Ops2.time_frame_int_div_by_zero + 1.76±0.06ms 11.0±0.2ms 6.25 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('eq') + 1.78±0.05ms 11.1±0.1ms 6.21 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('gt') + 1.77±0.06ms 10.9±0.03ms 6.18 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('ne') + 1.80±0.06ms 11.0±0.1ms 6.12 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('lt') + 1.80±0.06ms 11.0±0.04ms 6.11 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('ge') + 1.80±0.07ms 10.9±0.05ms 6.06 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('le') + 24.8±0.03ms 114±0.2ms 4.61 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('pow') + 2.04±0.02ms 9.31±0.09ms 4.57 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('sub') + 2.06±0.03ms 9.38±0.03ms 4.56 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('add') + 2.26±0.04ms 9.51±0.04ms 4.21 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('mul') + 3.15±0.01ms 12.8±0.08ms 4.06 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('truediv') + 21.9±7ms 82.7±2ms 3.78 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (1000, 10000)) + 23.4±3ms 66.4±7ms 2.84 arithmetic.Ops.time_frame_multi_and(True, 'default') + 10.7±0.09ms 29.7±2ms 2.77 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (10000, 1000)) + 44.6±0.3ms 116±9ms 2.59 arithmetic.Ops2.time_frame_float_floor_by_zero + 6.49±0.06ms 16.0±0.09ms 2.46 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (10000, 1000)) + 13.7±0.07ms 31.1±2ms 2.27 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (10000, 1000)) + 29.6±3ms 66.7±7ms 2.25 arithmetic.Ops.time_frame_multi_and(False, 1) + 29.9±3ms 65.8±6ms 2.20 arithmetic.Ops.time_frame_multi_and(False, 'default') + 1.98±0.03ms 4.24±0.03ms 2.14 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.99±0.02ms 4.24±0.03ms 2.13 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.97±0.03ms 4.10±0.03ms 2.08 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.98±0.04ms 4.08±0.02ms 2.06 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 32.5±3ms 66.3±6ms 2.04 arithmetic.Ops.time_frame_multi_and(True, 1) + 2.09±0.03ms 4.23±0.03ms 2.02 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.36±0.01ms 4.76±0.03ms 2.01 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.40±0.01ms 2.81±0.02ms 2.01 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.41±0.01ms 2.83±0.02ms 2.01 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.40±0.02ms 2.80±0.01ms 2.01 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.41±0.01ms 2.82±0.02ms 2.01 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.38±0.01ms 4.77±0.02ms 2.00 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.13±0.03ms 4.24±0.03ms 1.99 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.12±0.01ms 4.23±0.02ms 1.99 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.38±0.01ms 4.72±0.02ms 1.98 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.42±0.01ms 2.81±0.02ms 1.97 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.43±0.01ms 2.81±0.02ms 1.97 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.36±0.02ms 4.66±0.02ms 1.97 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.38±0.01ms 4.65±0.03ms 1.95 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.11±0.01ms 4.10±0.03ms 1.95 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.40±0.01ms 2.72±0.01ms 1.94 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.10±0.03ms 4.08±0.02ms 1.94 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.40±0.02ms 2.70±0.01ms 1.93 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.11±0.02ms 4.08±0.02ms 1.93 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.40±0.02ms 2.71±0.01ms 1.93 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.39±0.02ms 4.61±0.03ms 1.93 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.41±0.01ms 2.70±0.03ms 1.92 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.51±0.02ms 2.82±0.02ms 1.86 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.28±0.03ms 4.22±0.03ms 1.85 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.55±0.01ms 2.84±0.01ms 1.83 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.54±0.01ms 2.82±0.02ms 1.83 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 45.4±0.04ms 83.1±0.5ms 1.83 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (10000, 1000)) + 15.5±0.7ms 28.2±0.2ms 1.82 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (10000, 1000)) + 2.26±0.03ms 4.08±0.03ms 1.81 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 3.03±0.02ms 5.43±0.03ms 1.79 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.52±0.03ms 2.71±0.01ms 1.78 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.54±0.01ms 2.73±0.02ms 1.77 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 1.53±0.02ms 2.70±0.01ms 1.77 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 3.01±0.03ms 5.31±0.01ms 1.76 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 46.5±0.9ms 80.1±0.5ms 1.72 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis0('floordiv') + 24.0±0.06ms 41.1±0.2ms 1.71 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 24.1±0.1ms 41.0±0.2ms 1.70 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 3.25±0.08ms 5.48±0.02ms 1.69 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.01±0.04ms 3.39±0.02ms 1.68 arithmetic.Ops.time_frame_comparison(True, 'default') + 31.7±0.2ms 53.3±4ms 1.68 arithmetic.Ops2.time_frame_dot + 10.6±0.8ms 17.9±0.1ms 1.68 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('eq') + 10.6±0.8ms 17.8±0.3ms 1.68 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('gt') + 10.6±0.7ms 17.7±0.1ms 1.66 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('ge') + 10.7±0.8ms 17.8±0.2ms 1.66 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('ne') + 10.7±0.8ms 17.7±0.09ms 1.66 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('lt') + 10.7±0.9ms 17.6±0.3ms 1.65 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('le') + 1.73±0.01ms 2.84±0.01ms 1.64 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.87±0.01ms 2.98±0.02ms 1.60 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.88±0.02ms 2.99±0.02ms 1.59 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.88±0.01ms 2.97±0.02ms 1.58 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.89±0.02ms 2.99±0.01ms 1.58 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.74±0.01ms 2.72±0.01ms 1.56 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 10.6±0.07ms 16.4±0.1ms 1.54 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (100000, 100)) + 1.89±0.03ms 2.89±0.03ms 1.52 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.89±0.03ms 2.86±0.03ms 1.52 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.39±0.01ms 3.60±0.02ms 1.51 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.19±0.03ms 3.31±0.02ms 1.51 arithmetic.Ops.time_frame_comparison(False, 'default') + 2.21±0.05ms 3.30±0.01ms 1.49 arithmetic.Ops.time_frame_comparison(False, 1) + 3.61±0.07ms 5.37±0.02ms 1.48 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.45±0.02ms 3.60±0.03ms 1.47 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 1.98±0.03ms 2.89±0.02ms 1.46 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.35±0.01ms 3.42±0.03ms 1.45 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.44±0.02ms 3.55±0.03ms 1.45 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 1.94±0.01ms 2.79±0.02ms 1.44 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 1.95±0.03ms 2.79±0.02ms 1.43 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.02±0.02ms 2.89±0.03ms 1.43 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.40±0.01ms 3.43±0.01ms 1.43 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.40±0.01ms 3.42±0.02ms 1.43 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.39±0.01ms 3.40±0.03ms 1.42 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.39±0.01ms 3.41±0.03ms 1.42 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.36±0.01ms 3.36±0.02ms 1.42 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.40±0.01ms 3.40±0.02ms 1.42 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.42±0.02ms 3.44±0.02ms 1.42 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 3.00±0.03ms 4.25±0.4ms 1.42 arithmetic.Ops.time_frame_mult(False, 'default') + 2.37±0.01ms 3.35±0.02ms 1.41 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 2.39±0.01ms 3.37±0.02ms 1.41 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 10.6±0.1ms 14.9±0.08ms 1.41 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (1000000, 10)) + 2.37±0.02ms 3.34±0.05ms 1.41 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.37±0.02ms 3.33±0.01ms 1.40 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 45.4±0.06ms 63.7±4ms 1.40 arithmetic.Ops2.time_frame_float_div + 2.38±0.02ms 3.31±0.02ms 1.39 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.36±0.03ms 3.28±0.02ms 1.39 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.11±0.01ms 2.92±0.03ms 1.38 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.36±0.01ms 3.26±0.02ms 1.38 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.09±0.02ms 2.88±0.02ms 1.38 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.36±0.01ms 3.25±0.03ms 1.38 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.38±0.01ms 3.28±0.02ms 1.38 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.03±0.03ms 2.80±0.03ms 1.38 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 2.02±0.02ms 2.79±0.02ms 1.38 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 11.1±1ms 15.2±0.1ms 1.38 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('truediv') + 10.9±1ms 14.9±0.2ms 1.37 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('mul') + 2.37±0.02ms 3.23±0.03ms 1.37 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 3.02±0.04ms 4.12±0.3ms 1.37 arithmetic.Ops.time_frame_mult(False, 1) + 10.9±1ms 14.8±0.2ms 1.36 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('add') + 2.13±0.03ms 2.89±0.02ms 1.36 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 10.9±1ms 14.8±0.3ms 1.35 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('sub') + 2.06±0.02ms 2.77±0.02ms 1.35 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 48.4±0.06ms 65.0±0.05ms 1.34 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 48.4±0.05ms 65.1±0.03ms 1.34 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 48.4±0.09ms 64.9±0.05ms 1.34 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 2.20±0.02ms 2.89±0.02ms 1.31 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 2.72±0.04ms 3.52±0.02ms 1.29 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 3.05±0.06ms 3.94±0.01ms 1.29 arithmetic.Ops.time_frame_add(False, 1) + 38.9±0.1ms 49.9±3ms 1.28 arithmetic.Ops2.time_frame_float_mod + 2.19±0.02ms 2.79±0.02ms 1.27 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 34.2±0.2ms 42.4±3ms 1.24 arithmetic.Ops2.time_frame_int_mod + 35.6±0.8ms 41.4±0.3ms 1.16 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('pow') + 45.4±0.02ms 49.4±0.4ms 1.09 arithmetic.FrameWithFrameWide.time_op_different_blocks(, (100000, 100)) + 40.1±0.02ms 42.7±0.07ms 1.06 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 32.9±0.03ms 34.8±0.04ms 1.06 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 39.2±0.01ms 41.5±0.1ms 1.06 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) + 32.8±0.05ms 34.5±0.04ms 1.05 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 32.6±0.03ms 34.2±0.03ms 1.05 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) + 32.6±0.02ms 34.1±0.04ms 1.04 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) + 32.8±0.01ms 34.2±0.03ms 1.04 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) + 39.9±1ms 41.3±0.4ms 1.03 arithmetic.MixedFrameWithSeriesAxis.time_frame_op_with_series_axis1('floordiv') - 3.08±0.01ms 3.03±0ms 0.99 arithmetic.IndexArithmetic.time_divide('int') - 2.31±0.01ms 2.27±0.01ms 0.98 arithmetic.IndexArithmetic.time_subtract('int') - 2.32±0.01ms 2.26±0.01ms 0.98 arithmetic.IndexArithmetic.time_add('int') - 1.17±0.01ms 1.14±0ms 0.98 arithmetic.NumericInferOps.time_divide() - 1.63±0.1ms 1.57±0ms 0.97 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 308±2μs 294±2μs 0.95 arithmetic.NumericInferOps.time_add() - 14.9±0.6ms 14.2±0.01ms 0.95 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (1000000, 10)) - 308±2μs 292±1μs 0.95 arithmetic.NumericInferOps.time_multiply() - 310±2μs 294±3μs 0.95 arithmetic.NumericInferOps.time_subtract() - 510±20μs 483±2μs 0.95 arithmetic.NumericInferOps.time_divide() - 510±10μs 484±3μs 0.95 arithmetic.NumericInferOps.time_add() - 509±10μs 483±3μs 0.95 arithmetic.NumericInferOps.time_add() - 308±3μs 292±3μs 0.95 arithmetic.NumericInferOps.time_subtract() - 4.15±0.07ms 3.92±0.01ms 0.95 arithmetic.NumericInferOps.time_modulo() - 199±2μs 188±1μs 0.95 arithmetic.NumericInferOps.time_subtract() - 203±2μs 191±1μs 0.94 arithmetic.NumericInferOps.time_add() - 251±1μs 236±0.9μs 0.94 arithmetic.OffsetArrayArithmetic.time_add_series_offset() - 1.27±0.1ms 1.19±0ms 0.94 arithmetic.NumericInferOps.time_divide() - 213±2μs 199±1μs 0.93 arithmetic.NumericInferOps.time_multiply() - 202±3μs 188±0.6μs 0.93 arithmetic.NumericInferOps.time_subtract() - 513±20μs 474±2μs 0.92 arithmetic.NumericInferOps.time_subtract() - 529±20μs 487±2μs 0.92 arithmetic.NumericInferOps.time_multiply() - 523±20μs 477±4μs 0.91 arithmetic.NumericInferOps.time_subtract() - 6.97±0.4ms 5.82±0.03ms 0.84 arithmetic.FrameWithFrameWide.time_op_same_blocks(, (1000000, 10)) - 68.3±0.1ms 46.8±0.1ms 0.69 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) - 62.7±0.1ms 42.6±0.02ms 0.68 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) - 65.4±0.2ms 44.1±0.07ms 0.67 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) - 65.6±0.2ms 44.2±0.1ms 0.67 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) - 65.3±0.2ms 44.0±0.07ms 0.67 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) - 61.7±0.1ms 41.3±0.1ms 0.67 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) - 62.1±0.07ms 41.5±0.03ms 0.67 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 5.0, ) - 62.1±0.04ms 41.4±0.04ms 0.67 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 3.0, ) - 35.3±0.1ms 14.1±0.03ms 0.40 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 4, ) - 24.4±0.1ms 3.35±0.02ms 0.14 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) - 48.8±0.05ms 3.27±0.02ms 0.07 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(, 2, ) ```

Two notes about a few benchmarks that are (unexpectedly) a lot faster with ArrayManager:

The IntFrameWithScalar benchmark is more than 10x faster for the pow operations for certain cases. I could reproduce this outside of ASV, and turns out this is due to numexpr being slower, and because of the size of the dataframe, the column-wise op (ArrayManager) doesn't use numexpr, while block-wise (BlockManager) uses numexpr. In an environment without numexpr, both were more or less the same.
The IntFrameWithScalar benchmark with the floordiv operation is faster with ArrayManager. This turns out because when using numexpr, we incorrectly end up falling back using _masked_arith_op (which is slower), because the operation is not supported by numexpr.

jorisvandenbossche commented 2 years ago

Over the last weeks I have been updating the status of this project (and fixing some regressions), and rerunning the benchmarks. This (long) post gives an overview of the current ASV benchmarks with the ArrayManager.

Technical notes: I always ran the benchmark on a commit where I changed the default to ArrayManager (HEAD) vs the previous commit with the normal default of BlockManager (HEAD~1) -> asv continuous -f 1.0 HEAD~1 HEAD. So using the diff formatting to get some color: green is slower and red is faster for the ArrayManager.
The "am-benchmarks" branch I am using is master (of yesterday morning) + a few open close-to-mergeable (all-green) ArrayManager related PRs (#41104 (fillna) , #44736 (interpolate), #44791 (eval)) + the commit to change the default.

I am going to split the results here by topic / file (each time with a small discussion, repeating some stuff from above), but the results of the full run are also included at the bottom.

ToNumpy

$ asv continuous -f 1.01 -b ToNumpy HEAD~1 HEAD