pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.62k stars 17.58k forks source link

Potential regression induced by PR #56037 #57366

Open rhshadrach opened 4 months ago

rhshadrach commented 4 months ago

PR #56037 may have induced a performance regression. If it was a necessary behavior change, this may have been expected and everything is okay.

Please check the links below. If any ASVs are parameterized, the combinations of parameters that a regression has been detected for appear as subbullets.

Subsequent benchmarks may have skipped some commits. The link below lists the commits that are between the two benchmark runs where the regression was identified.

https://github.com/pandas-dev/pandas/compare/05c32ba18f88921b78dc5984c70956247497ab4c...d9f70b397a010754ae41e7d201bba05834294559

cc @jbrockmendel

rtlee9 commented 4 months ago

My local asv benchmarking shows the regression was introduced in d9f70b3 rather than the previous commits

(venv) ➜  asv_bench git:(d9f70b397a) asv continuous -f 1.1 -E virtualenv HEAD~ HEAD -b inference.ToDatetimeFromIntsFloats.time_                                                                                                                                                                                                 Couldn't load asv.plugins._mamba_helpers because                                                                                                                                                                                                                                                                                No module named 'libmambapy'                                                                                                                                                                                                                                                                                                    · Creating environments                                                                                                                                                                                                                                                                                                         · Discovering benchmarks                                                                                                                                                                                                                                                                                                        ·· Uninstalling from virtualenv-py3.10-Cython3.0.5-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter.                                                                                                                                              ·· Installing d9f70b39 <v2.3.0.dev0~271> into virtualenv-py3.10-Cython3.0.5-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter.                                                                                                                     · Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)                                                                                                                                                                                                                                                       [ 0.00%] · For pandas commit e37ff77b <v2.3.0.dev0~272> (round 1/2):                                                                                                                                                                                                                                                            [ 0.00%] ·· Building for virtualenv-py3.10-Cython3.0.5-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter..
[ 0.00%] ·· Benchmarking virtualenv-py3.10-Cython3.0.5-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter                                                                                                                                           [ 4.17%] ··· Running (inference.ToDatetimeFromIntsFloats.time_nanosec_float64--)......                                                                                                                                                                                                                                          [25.00%] · For pandas commit d9f70b39 <v2.3.0.dev0~271> (round 1/2):                                                                                                                                                                                                                                                            [25.00%] ·· Building for virtualenv-py3.10-Cython3.0.5-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter..
[25.00%] ·· Benchmarking virtualenv-py3.10-Cython3.0.5-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter
[29.17%] ··· Running (inference.ToDatetimeFromIntsFloats.time_nanosec_float64--)......                                                                                                                                                                                                                                          [50.00%] · For pandas commit d9f70b39 <v2.3.0.dev0~271> (round 2/2):                                                                                                                                                                                                                                                            [50.00%] ·· Benchmarking virtualenv-py3.10-Cython3.0.5-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter
[54.17%] ··· inference.ToDatetimeFromIntsFloats.time_nanosec_float64                                                                                 260±3ms
[58.33%] ··· inference.ToDatetimeFromIntsFloats.time_nanosec_int64                                                                               3.16±0.09ms
[62.50%] ··· inference.ToDatetimeFromIntsFloats.time_nanosec_uint64                                                                               3.03±0.2ms                                                                                                                                                                    [66.67%] ··· inference.ToDatetimeFromIntsFloats.time_sec_float64                                                                                     262±2ms                                                                                                                                                                    [70.83%] ··· inference.ToDatetimeFromIntsFloats.time_sec_int64                                                                                    31.1±0.3ms
[75.00%] ··· inference.ToDatetimeFromIntsFloats.time_sec_uint64                                                                                   30.9±0.2ms
[75.00%] · For pandas commit e37ff77b <v2.3.0.dev0~272> (round 2/2):
[75.00%] ·· Building for virtualenv-py3.10-Cython3.0.5-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter..
[75.00%] ·· Benchmarking virtualenv-py3.10-Cython3.0.5-jinja2-matplotlib-meson-meson-python-numba-numexpr-odfpy-openpyxl-pyarrow-python-build-scipy-sqlalchemy-tables-xlrd-xlsxwriter
[79.17%] ··· inference.ToDatetimeFromIntsFloats.time_nanosec_float64                                                                              6.17±0.3ms
[83.33%] ··· inference.ToDatetimeFromIntsFloats.time_nanosec_int64                                                                               2.93±0.06ms
[87.50%] ··· inference.ToDatetimeFromIntsFloats.time_nanosec_uint64                                                                              2.91±0.06ms
[91.67%] ··· inference.ToDatetimeFromIntsFloats.time_sec_float64                                                                                  5.51±0.3ms
[95.83%] ··· inference.ToDatetimeFromIntsFloats.time_sec_int64                                                                                    31.0±0.3ms
[100.00%] ··· inference.ToDatetimeFromIntsFloats.time_sec_uint64                                                                                  30.9±0.08ms
| Change   | Before [e37ff77b] <v2.3.0.dev0~272>   | After [d9f70b39] <v2.3.0.dev0~271>   |   Ratio | Benchmark (Parameter)                                   |
|----------|---------------------------------------|--------------------------------------|---------|---------------------------------------------------------|
| +        | 5.51±0.3ms                            | 262±2ms                              |   47.56 | inference.ToDatetimeFromIntsFloats.time_sec_float64     |
| +        | 6.17±0.3ms                            | 260±3ms                              |   42.2  | inference.ToDatetimeFromIntsFloats.time_nanosec_float64 |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
rhshadrach commented 4 months ago

Thanks @rtlee9 - isn't that commit associated with the highlighted PR in the OP?

rtlee9 commented 4 months ago

Yeah I was just confirming it was that commit in particular, since the asv benchmarks had skipped a few commits

Subsequent benchmarks may have skipped some commits. The link below lists the commits that are between the two benchmark runs where the regression was identified.

rhshadrach commented 4 months ago

Ah - thanks for confirming.