pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.61k stars 17.9k forks source link

Potential performance regression with "API: value_counts to consistently maintain order of input" #59992

Closed DeaMariaLeon closed 2 weeks ago

DeaMariaLeon commented 2 weeks ago

PR #59745 @rhshadrach Screenshot 2024-10-07 at 14 39 22

"groupby.GroupByMethods.time_dtype_as_group (Python) with application='direct', dtype='int16', method='value_counts', ncols=1, param5='cython'": "http://57.128.112.95:5000/compare/benchmarks/06701c3bb7a472ce800063618142c648...06702dd4b5a8721d800002f7c5a729c4", "groupby.GroupByMethods.time_dtype_as_group (Python) with application='direct', dtype='object', method='value_counts', ncols=1, param5='cython'": "http://57.128.112.95:5000/compare/benchmarks/06701c3bd51c799280002f4c4e9c54cb...06702dd4d44d70ec8000dee1970b99d9", "groupby.GroupByMethods.time_dtype_as_group (Python) with application='direct', dtype='float', method='value_counts', ncols=1, param5='cython'": "http://57.128.112.95:5000/compare/benchmarks/06701c3bcbf8727a80003f7c6cb544eb...06702dd4caca74d68000c0ae0fc0c6ac", "groupby.GroupByMethods.time_dtype_as_group (Python) with application='direct', dtype='int', method='value_counts', ncols=1, param5='cython'": "http://57.128.112.95:5000/compare/benchmarks/06701c3ba2e97361800099085aef8ea1...06702dd4a08e78b68000cea55a0cf474", "groupby.GroupByMethods.time_dtype_as_field (Python) with application='direct', dtype='int16', method='value_counts', ncols=1, param5='cython'": "http://57.128.112.95:5000/compare/benchmarks/06701c3b5d607c988000e105e684bd7e...06702dd457547156800015aefb91cd35", "groupby.GroupByMethods.time_dtype_as_field (Python) with application='direct', dtype='uint', method='value_counts', ncols=1, param5='cython'": "http://57.128.112.95:5000/compare/benchmarks/06701c3b8e5371bf800050742a52feb7...06702dd48b7370c38000dd78883d9f72", "groupby.GroupByMethods.time_dtype_as_field (Python) with application='direct', dtype='int', method='value_counts', ncols=1, param5='cython'": "http://57.128.112.95:5000/compare/benchmarks/06701c3b496474398000f10a224d80c7...06702dd442767a948000a4cd3b23e711", "groupby.GroupByMethods.time_dtype_as_field (Python) with application='direct', dtype='object', method='value_counts', ncols=1, param5='cython'": "http://57.128.112.95:5000/compare/benchmarks/06701c3b7abe7b688000a3c66aab4c6f...06702dd4770876fa8000c307419bd755", "groupby.GroupByMethods.time_dtype_as_group (Python) with application='direct', dtype='uint', method='value_counts', ncols=1, param5='cython'": "http://57.128.112.95:5000/compare/benchmarks/06701c3be8397aec80002970cb77654e...06702dd4e7ce7ca48000147baece32b2", "groupby.GroupByMethods.time_dtype_as_field (Python) with application='direct', dtype='float', method='value_counts', ncols=1, param5='cython'": "http://57.128.112.95:5000/compare/benchmarks/06701c3b716970cb80006db835563ce8...06702dd46da4778580009d623b9a289a"

rhshadrach commented 2 weeks ago

Thanks @DeaMariaLeon - this PR corrected incorrect behavior and the difference in performance seems very reasonable to me. Closing.