rust-lang / rust-clippy

A bunch of lints to catch common mistakes and improve your Rust code. Book: https://doc.rust-lang.org/clippy/
https://rust-lang.github.io/rust-clippy/
Other
11.46k stars 1.55k forks source link

vec_init_then_push should only trigger if there are no further pushes #7071

Closed jyn514 closed 2 years ago

jyn514 commented 3 years ago

Lint name: vec_init_then_push

I tried this code:

    let mut arg_strings: Vec<Box<str>> = Vec::new();
    arg_strings.push(name.to_owned().into_boxed_str());
    for arg in args {
        arg_strings.push(
            arg.as_ref()
                .to_os_string()
                .into_string()
                .unwrap()
                .into_boxed_str(),
        );
    }

I expected to see this happen: The lint doesn't trigger, because using vec![] and then immediately pushing doesn't improve the code any, it uses two different styles of initializing the vector.

Instead, this happened:

warning: calls to `push` immediately after creation
   --> tests/mock/clitools.rs:613:5
    |
613 | /     let mut arg_strings: Vec<Box<str>> = Vec::new();
614 | |     arg_strings.push(name.to_owned().into_boxed_str());
    | |_______________________________________________________^ help: consider using the `vec![]` macro: `let mut arg_strings: Vec<Box<str>> = vec![..];`

cc https://github.com/rust-lang/rustup/pull/2718, @kinnison

Meta

Jarcho commented 3 years ago

This isn't really a false positive. Could be

    let mut arg_strings: Vec<Box<str>> = vec![name.to_owned().into_boxed_str()];
    for arg in args {
        arg_strings.push(
            arg.as_ref()
                .to_os_string()
                .into_string()
                .unwrap()
                .into_boxed_str(),
        );
    }
kinnison commented 3 years ago

What about if you have code of the form:

let mut foo = Vec::new();
foo.push("bar".to_string());
for x in 0..9 {
    foo.push(format!("baz {}", x));
}
foo.push("wibble".to_string());
.....

The pattern here is clear as pushes, but if you move the first push into a vec![] then the pattern is less clean. Also vec![] essentially just does the new+push pair clippy is wanting to replace, with no performance benefit to be drawn.

The example in the lint shows the removal of mut as a result of doing this work, and realistically I think the lint should only fire if it'd be possible to remove the mut as a result of the refactor, or at least if there are zero push calls as a result.

Jarcho commented 3 years ago

Really Vec::with_capacity should be used in this case from a performance perspective. It would also stop this lint from triggering. That's a whole different discussion though.

In this case I guess if there's only one or two items being pushed before a loop pushing items (or extend, and maybe others) it might be better to leave it as is.

camsteffen commented 3 years ago

I think the original request is reasonable. The lint is really only valuable if you can put all the elements in vec!.

Jarcho commented 3 years ago

From a performance perspective it's always better to use the macro with the current Vec implementation. A default Vec starts with no capacity and increases by powers of two, starting at 1. So even the macro with a single item avoids that first capacity check.

uazu commented 3 years ago

That's not true (that it increases from a starting capacity of 1). I checked the source. It starts with zero, and then jumps to a capacity of 8 for u8, or 4 for anything else under 1KB. I have a situation where I create a Vec, add one item and then add several more in a loop. The clippy suggestion to switch to vec! will produce less-efficient code, since (checking the source) the vec! creates a box (of size one) which is then converted into a Vec of capacity one.

So my original code creates an empty Vec, then allocates space for 4 items, and so on. Switching to vec!, it would allocate space for 1 item, then jump to 4 and so on. So that's one extra unnecessary malloc/free with the clippy suggestion.

I agree with the original request: Using vec! should only be suggested when nothing else will be added to the Vec later on. Or if you see more than 4 pushes immediately after the Vec creation (with current std library implementation), it would also be worthwhile switching to vec! even if there are pushes later on. Otherwise it becomes an annoying clippy lint that has to be disabled. Thanks

uazu commented 3 years ago

[Here's a playground link](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&code=fn%20main()%20%7B%0A%20%20%20%20let%20mut%20v1%3A%20Vec%3Cu64%3E%20%3D%20Vec%3A%3Anew()%3B%0A%20%20%20%20v1.push(1)%3B%0A%20%20%20%20println!(%22v1%20%7B%7D%22%2C%20v1.capacity())%3B%0A%20%20%20%20v1.push(2)%3B%0A%20%20%20%20println!(%22v1%20%7B%7D%22%2C%20v1.capacity())%3B%0A%0A%20%20%20%20let%20mut%20v2%20%3D%20vec!%5B1%5D%3B%0A%20%20%20%20println!(%22v2%20%7B%7D%22%2C%20v2.capacity())%3B%0A%20%20%20%20v2.push(2)%3B%0A%20%20%20%20println!(%22v2%20%7B%7D%22%2C%20v2.capacity())%3B%0A%7D%0A) that demonstrates the problem. Assuming that the push of 2 occurs later, the clippy suggestion costs an extra malloc/free.

camsteffen commented 3 years ago

@uazu Good point, that adds some nuance.

Or if you see more than 4 pushes immediately after the Vec creation (with current std library implementation)

I don't think we should be dependent on the std implementation to that level of detail.

Jarcho commented 3 years ago

Looks like I was looking at old sources.

camsteffen commented 3 years ago

Here's an idea. We can split into two lints.

  1. Vec::new is followed by a constant number of pushes - should use vec!
  2. Vec::new is followed by a dynamic number of pushes - should use Vec::with_capacity
uazu commented 3 years ago

Yes, case 1 seems like a clear win. Also using vec! in this case won't over-allocate (i.e. it will allocate exactly the right number, unlike pushing to a Vec).

Case 2 depends. Sometimes it's not possible to determine the required capacity in advance. Can clippy detect when the capacity could have been determined by the coder? That seems like quite an involved analysis. Sometimes Vec's behaviour is exactly what you want, i.e. optimised for short lists, but can still handle big lists if required, i.e. it scales smoothly.

camsteffen commented 3 years ago

Can clippy detect when the capacity could have been determined by the coder? That seems like quite an involved analysis.

I think it is theoretically always possible to calculate the number of elements pushed within the function before pushing. But oftentimes it is too complex and not worth it. If the Vec is returned or passed mutably to another function, all bets are off. The really simple cases may be better written using collect.

Sometimes Vec's behaviour is exactly what you want

Very true. That means a lot of false positives for case 2. It would definitely not be warn-by-default. But I'm starting to think we should not lint those cases at all - which just goes back to the original request...

rbtcollins commented 2 years ago

I think it is theoretically always possible to calculate the number of elements pushed within the function before pushing.

@camsteffen only if the content is coming from a ExactSizeIterator or similarly capable source. For instance, an iterator over stdin lines() stopping at the first blank line isn't able to estimate the size at all without buffering (presumably in another vector, giving this issue again :P).