rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.18k stars 12.7k forks source link

Potential Compiler Optimization Issue with &Vec<Vec<T>> vs &[Vec<T>]? #124925

Open zhongyi51 opened 5 months ago

zhongyi51 commented 5 months ago

rustc version: 1.78

See Godbolt,l:'5',n:'0',o:'Rust+source+%231',t:'0')),k:45.59356256309758,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:r1780,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'1',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,libs:!(),options:'-C+opt-level%3D2',overrides:!(),selection:(endColumn:41,endLineNumber:6,positionColumn:41,positionLineNumber:6,selectionStartColumn:41,selectionStartLineNumber:6,startColumn:41,startLineNumber:6),source:1),l:'5',n:'0',o:'+rustc+1.78.0+(Editor+%231)',t:'0')),header:(),k:54.406437436902436,l:'4',m:100,n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4) .

In my understanding, the function that accepts &Vec<Vec> should have the same optimization as the function that accepts &[Vec], because &Vec<Vec> can be turned into &[Vec] by simply calling as_slice() or as_mut_slice(). However, I found that &Vec<Vec> may hinder further vectorization in the nested loop, as shown in the above link.

I am wondering whether this is a compiler bug or an intended action?

clubby789 commented 5 months ago

An &Vec<Vec<T>> requires 3 pointer reads to access a T: dereferencing the reference, reading the outer Vecs pointer to get the inner vec, and reading the inner Vecs pointer to access the element &[Vec<T>] only requires 2 reads - reading the vec from the slice, and reading the element from within that vec Calling as_slice on &Vec<...> requires 2 reads (reading the pointer and length fields from behind the reference) so it is not a free conversion https://godbolt.org/z/rYaY4W5sv

In general, you should always accept &[T] rather than &Vec<T> for both performance and ergonomics. There's a Clippy lint explaining the reasoning

zhongyi51 commented 5 months ago

@clubby789 Thank you for your response! Indeed, I am curious about how &Vec<Vec> hinders vectorization in the nested loop, rather than the cost of reading &Vec<Vec>.

Here is an example: [Godbolt](https://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,selection:(endColumn:10,endLineNumber:10,positionColumn:10,positionLineNumber:10,selectionStartColumn:10,selectionStartLineNumber:10,startColumn:10,startLineNumber:10),source:'%0A%23%5Bno_mangle%5D%0Aunsafe+fn+per_pos_add_0(a_r:%26Vec%3CVec%3Cf64%3E%3E,b_r:%26mut+Vec%3CVec%3Cf64%3E%3E,size:usize)%7B%0A++++let+a%3Da_r.as_slice()%3B%0A++++let+b%3Db_r.as_mut_slice()%3B%0A++++for+i+in+0..size%7B%0A++++++++for+j+in+0..size%7B%0A++++++++++++let+v%3Da.get_unchecked(i).get_unchecked(j)%3B%0A++++++++++++*b.get_unchecked_mut(i).get_unchecked_mut(j)%2B%3Dv%3B%0A++++++++%7D%0A++++%7D%0A%7D%0A%0A%23%5Bno_mangle%5D%0Aunsafe+fn+per_pos_add_1(a:%26%5BVec%3Cf64%3E%5D,b:%26mut+%5BVec%3Cf64%3E%5D,size:usize)%7B%0A++++for+i+in+0..size%7B%0A++++++++for+j+in+0..size%7B%0A++++++++++++let+v%3Da.get_unchecked(i).get_unchecked(j)%3B%0A++++++++++++*b.get_unchecked_mut(i).get_unchecked_mut(j)%2B%3Dv%3B%0A++++++++%7D%0A++++%7D%0A%7D%0A'),l:'5',n:'0',o:'Rust+source+%231',t:'0')),k:45.59356256309758,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:r1780,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'1',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,libs:!(),options:'-C+opt-level%3D2',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+rustc+1.78.0+(Editor+%231)',t:'0')),header:(),k:54.406437436902436,l:'4',m:100,n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4). Even if I manually call as_slice() and as_mut_slice() before the nested loop, the lack of vectorization persists.

I just hope this under-optimization is intentional.

scottmcm commented 5 months ago

The OP moved this to IRLO, so cc https://internals.rust-lang.org/t/potential-compiler-optimization-issue-with-vec-const-t-vs-const-t/20865?u=scottmcm.

I am wondering whether this is a compiler bug or an intended action?

How about neither? It's an imperfection in optimizations, but imperfect optimization in non-canonical code isn't really something I'd call a "bug", even if it's not "intended" either.