Tail calls confuse the trace compiler.

vext01 commented 2 years ago

  void __attribute__((noinline)) f() {                                                                                                                                                                             
    fputs("inside f\n", stderr);                                                                                                                                                                                   
    return;                                                                                                                                                                                                        
  }

At clang -O3 on amd64 is codegenned to:

            ;-- f:                                                                                                                                                                                                 
            0x00201b90      488b0d092400.  mov   rcx, qword [obj.stderr]                                                     
            0x00201b97      bfa4082000     mov   edi, str.inside_f                                                 
            0x00201b9c      be09000000     mov   esi, 9                                                                                                                                                            
            0x00201ba1   2  ba01000000     mov   edx, 1                                                                                                                                                            
        ┌─< 0x00201ba6      e915010000     jmp   sym.imp.fwrite

Here fwrite is jumped to, rather than called. The compiler does this because it's cheaper to re-use the caller's frame (where possible, and here it is). This means that the x86 ret instruction of fwrite() will effectively return from both f() and fwrite().

The optimisation happens during instruction selection, meaning that the IR used to build traces always contains a call IR instruction regardless of whether this optimisation is applied.

The optimisation confuses the trace compiler, which, after seeing the IR call to fwrite expects to see trace execution pass through a block containing a ret IR instruction for fwrite, but of course one never comes.

vext01 commented 2 years ago

https://github.com/ykjit/ykllvm/pull/24 works around this by disabling the optimisation, but in the long run we should find a better way.

ltratt commented 1 year ago

This is related to https://github.com/ykjit/yk/issues/610, in the sense that both are "unusual control flow things we don't handle".

ykjit / yk

Tail calls confuse the trace compiler. #502