rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.72k stars 309 forks source link

Specialize "loops of next" #818

Closed Philippe-Cholet closed 10 months ago

Philippe-Cholet commented 10 months ago

There are some cases where next/next_back is repeatedly called in a loop, where a specialized method would do the same job, except it might be faster if the iterator they adapt has some specialized faster method. It was the case in #816.

cargo bench --bench specializations "(filter_ok|filter_map_ok|unique|unique_by)/next"

unique/next             time:   [18.728 µs 18.779 µs 18.843 µs]
unique/next             time:   [14.166 µs 14.274 µs 14.406 µs]
                        change: [-26.223% -25.048% -23.941%]

unique/next_back        time:   [18.523 µs 18.599 µs 18.698 µs]
unique/next_back        time:   [14.189 µs 14.259 µs 14.344 µs]
                        change: [-25.116% -24.300% -23.568%]

unique_by/next          time:   [20.766 µs 20.798 µs 20.833 µs]
unique_by/next          time:   [19.123 µs 19.174 µs 19.230 µs]
                        change: [-8.9607% -7.5864% -6.1789%]

unique_by/next_back     time:   [20.852 µs 20.915 µs 20.979 µs]
unique_by/next_back     time:   [20.740 µs 20.825 µs 20.919 µs]
                        change: [-1.1273% -0.6647% -0.1885%]

filter_ok/next          time:   [910.82 ns 915.08 ns 920.18 ns]
filter_ok/next          time:   [880.88 ns 883.96 ns 887.81 ns]
                        change: [-5.5278% -4.1578% -2.4871%]

filter_map_ok/next      time:   [863.55 ns 865.77 ns 868.14 ns]
filter_map_ok/next      time:   [849.20 ns 852.26 ns 855.36 ns]
                        change: [-5.0466% -3.7661% -2.6449%]

Those benchmarks are not that impressive, they are based on slices that roughly reimplement the while loop themselves. But if the iterator they adapt have a specialized try_[r]fold on which find/find_map/rfind usually rely, then it should be faster.

The code is shorter in multiple cases and I believe clearer.

The partition function did not have any benchmark, there is now one and there are no difference, both 330µs.