rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.72k stars 309 forks source link

Prefer boxed slices over vectors internally #863

Closed Philippe-Cholet closed 8 months ago

Philippe-Cholet commented 8 months ago

While reading This Week in Rust 531 and specifically Identifying Rust's collect::<Vec<_>>() memory leak footgun, I got the remainder that

The proper data structure for fixed-length lists is Box<[T]> rather than Vec<T>.

So I went through our codebase and found three places to apply this: two are committed and MultiProduct. I thought I would wait #835 to be merged to do it there too but I reached one limitation of edition 2018 (in last specialization) so I'm not.

The other vectors we internally use do not have a fixed length. Such as Combinations that has a growing vector in order for Powerset to work too. (And I obviously do not want to break our iterators generating vectors when it could be boxed slices.)

codecov[bot] commented 8 months ago

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (7a1c22b) 93.48% compared to head (cc0123f) 93.77%. Report is 1 commits behind head on master.

Files Patch % Lines
src/exactly_one_err.rs 0.00% 5 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #863 +/- ## ========================================== + Coverage 93.48% 93.77% +0.28% ========================================== Files 48 48 Lines 6771 6696 -75 ========================================== - Hits 6330 6279 -51 + Misses 441 417 -24 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Philippe-Cholet commented 8 months ago

@phimuemue I noted that collect to Box<[_]> does .collect::<Vec<_>>().into_boxed_slice() in the std so if it reallocates then the std thinks it's the way.

into_boxed_slice does shrink_to_fit so I'm not sure I understand your "space consumption problem".

The only place I've thought of shrink_to_fit was for our pub fn kmerge_by.

phimuemue commented 8 months ago

into_boxed_slice does shrink_to_fit so I'm not sure I understand your "space consumption problem".

I was thinking about the cases where we keep the Vec: Should this shrink_to_fit then? (We can also postpone this.)

Philippe-Cholet commented 8 months ago

@phimuemue I think it has sense when we know it won't ever grow and we don't repeatedly shrink it. There is not that many cases in itertools: kmerge_by definition only I think.