rust-lang / rustc_codegen_gcc

libgccjit AOT codegen for rustc
Apache License 2.0
906 stars 60 forks source link

Gcc codegen adds additional checks while llvm codegen dosent #386

Closed psvri closed 9 months ago

psvri commented 9 months ago

Hello,

So I have this simple code as shown below and when I look at the codegen by gcc , I see that it adds additional checks while llvm dosent. Godbolt link -> https://godbolt.org/z/hEhaxvjPo

Is this an expected behavior ?

pub struct Vec3 {
    v: [f32; 3]
}

impl Vec3 {
    pub fn length_squared(&self) -> f32 {
        self.v.iter().map(|x| (*x).powf(2.0)).product()
    }
}
antoyo commented 9 months ago

This is not expected. It should be investigated.

psvri commented 9 months ago

Do you have any pointers how to investigate. I dont have much compiler experience, but will try to see if I can fix it.

antoyo commented 9 months ago

I guess a good first step would be to produce a similar example in C and see if both clang and gcc can optimize the code the same way.

antoyo commented 9 months ago

You could also try to rewrite the Rust code to use a loop to see if it makes any difference.

psvri commented 9 months ago

Some more observations

If I access the elements directly, I can see that the output is different. Although gcc generates fma instructions while llvm dosent but they dont have these additional checks. (Godbolt -> https://godbolt.org/z/x5bj146aG)

Similar situation happens if I use a loop (Goldbolt -> https://godbolt.org/z/erPn5ExxT)

I rewrote a similar code in c++ and I see that both gcc and llvm dont have this check (Godbolt -> https://godbolt.org/z/en7x4qMvE)

antoyo commented 9 months ago

Perhaps it would be interesting for the C++ version to try with functions from <algorithm> and closures to make it more similar to the Rust version.

psvri commented 9 months ago

So I rewrote it and still gcc and llvm dont add these check (Godbolt -> https://godbolt.org/z/ehMox84zx)

antoyo commented 9 months ago

Now that we know that gcc can optimize this, it is time to understand why it doesn't do it in this case.

I see two options here. Compare the output of those two options for the Rust program (where the optimization doesn't happen) and the C++ program where the optimization does happen:

  1. Compile the program with this environment variable set: CG_RUSTFLAGS=-Cllvm-args=-fopt-info-all ../cargo.sh build --release (even though it says "llvm-args", those arguments are actually sent to gcc). Maybe -fopt-info-missed would be better.
  2. Compile the program with this environment variable set to see the GIMPLE after each pass: CG_GCCJIT_DUMP_TREE_ALL=1 ../cargo.sh build --release (or maybe it is CG_GCCJIT_DUMP_IPA_ALL=1?, I can't remember exactly which gcc option shows the gimple with the optimization remarks). The output will be in /tmp/libgccjit-*.
psvri commented 9 months ago

Alright, Some more analysis later here is my observation

This is my simplified code. Notice the call to into_iter in the for loop.

pub struct Vec3 {
    v: [f32; 3]
}

impl Vec3 {
    pub fn sum_into_iter(&self) -> f32 {
        let mut x = 0.0;
        for i in self.v.into_iter() {
            x += i;
        }
        x
    }

    pub fn sum(&self) -> f32 {
        let mut x = 0.0;
        for i in self.v {
            x += i;
        }
        x
    }
}

The sum_into_iter gcc codegen has the additional checks while the fn sum dosent have it (godbolt -> https://godbolt.org/z/7TvEEx5Gs). But if I pass edition=2021, then gcc codegen dosent add these checks (goldbolt -> https://godbolt.org/z/Ka5e49dce) . This because the way code was generated for the above between editions , as shown in the compiler message

warning: this method call resolves to `<&[T; N] as IntoIterator>::into_iter` (due to backwards compatibility), but will resolve to <[T; N] as IntoIterator>::into_iter in Rust 2021rustc-cg-gcc (master) #1

So,Based on the above I rewrote the code as below and can confirm that when we iterate over a reference of the array we see these additional checks by gcc (godbolt -> https://godbolt.org/z/GE5aof8x9)

pub struct Vec3 {
    v: [f32; 3]
}

impl Vec3 {
    pub fn sum_into_iter(self) -> f32 {
        let mut x = 0.0;
        for i in <[f32;3] as IntoIterator>::into_iter(self.v) {
            x += i;
        }
        x
    }

    pub fn sum_into_iter_ref(self) -> f32 {
        let mut x = 0.0;
        for i in  <&[f32;3] as IntoIterator>::into_iter(&self.v) {
            x += i;
        }
        x
    }
}

Will try to analyze the gimple output tomorrow if I get time

antoyo commented 9 months ago

Could this be some missing nonnull attributes? Perhaps the compile adds null checks because of that?

antoyo commented 9 months ago

Seems like this was the case. I also needed to compile the sysroot in release mode, but I get the following asm locally:

00000000000137e0 <_RNvMCses7twB3hQql_9test_rustNtB2_4Vec314length_squared>:
   137e0:   c5 fa 10 07             vmovss xmm0,DWORD PTR [rdi]
   137e4:   c5 fa 10 57 04          vmovss xmm2,DWORD PTR [rdi+0x4]
   137e9:   c5 fa 10 4f 08          vmovss xmm1,DWORD PTR [rdi+0x8]
   137ee:   c5 ea 59 d2             vmulss xmm2,xmm2,xmm2
   137f2:   c5 fa 59 c0             vmulss xmm0,xmm0,xmm0
   137f6:   c5 f2 59 c9             vmulss xmm1,xmm1,xmm1
   137fa:   c5 fa 59 c2             vmulss xmm0,xmm0,xmm2
   137fe:   c5 fa 59 c1             vmulss xmm0,xmm0,xmm1
   13802:   c3                      ret
   13803:   66 2e 0f 1f 84 00 00    cs nop WORD PTR [rax+rax*1+0x0]
   1380a:   00 00 00 
   1380d:   0f 1f 00                nop    DWORD PTR [rax]

Can you reproduce locally? If so, we can close this issue.

The version on Compiler Explorer is out of date and I'm not sure if the sysroot is compiled in release mode. I'll check. Edit: it seems it's not compiled in release mode. I mentioned that in an issue and should be fixed eventually.

psvri commented 9 months ago

Hello,

When I check it locally the additional checks are not there. Thanks for the help. You can close the issue.