riscvarchive / riscv-gcc

GNU General Public License v2.0
360 stars 275 forks source link

long long int vararg on 32-Bit machine with stack pointer divisible by 4 #63

Closed s-macke closed 7 years ago

s-macke commented 7 years ago

During the boot process of 32-Bit Linux kernel several messages are shown in the console such as the found devices with their hardware addresses. However the debug output of 64-Bit (long long int) numbers showed sometimes random 64-Bit numbers. After a long quest through the Linux kernel and hours of disassembling and live debugging I figured out that the variadic arguments used by the printk command are responsible for the problem. The variadic argument implementation can't manage stack pointers which are divisible by 4.

The code to reproduce the problem is

#include<stdio.h>
#include<stdarg.h>
#include<string.h>

void PrintLongLong (int n, ...)
{
  int i;
  unsigned long long val;
  va_list vl;
  va_start(vl, n);
  for (i=0;i<n;i++)
  {
    val = va_arg(vl, long long);
    printf ("0x%llx\n", val);
  }
  va_end(vl);
}

unsigned long long x = 0x0123456789abcdef;
unsigned long long y = 0xfedcba9876543210;

void Trampoline()
{
   // Move stack pointer to a offset only divisible by 4.
   // A move of 8 bytes works again.
    asm("addi sp, sp, -4");
    PrintLongLong(2, y, x);
    asm("addi sp, sp, 4");
}

int main()
{
        Trampoline();
        return 0;
}

Run with

riscv32-unknown-linux-gnu-gcc -O2 --static var_arg_problem.c
spike --isa=rv32 pk ./a.out

I am unsure whether the problem is in the kernel or in the compiler. So either the Linux kernel sets the stack pointer to a wrongly aligned address or the variadic argument implementation of gcc is wrong.

palmer-dabbelt commented 7 years ago

Thanks for finding this bug. Can you post a preprocessed source file? That way we'll be looking at exactly the same thing.

I might not have time to look at this for a bit.

s-macke commented 7 years ago

The preprocessor just includes the builtin vararg implementation of gcc here.

# 5 "var_arg_problem.c"
void PrintLongLong (int n, ...)
{
  int i;
  unsigned long long val;
  va_list vl;

# 10 "var_arg_problem.c" 3 4
 __builtin_va_start(
# 10 "var_arg_problem.c"
 vl
# 10 "var_arg_problem.c" 3 4
 ,
# 10 "var_arg_problem.c"
 n
# 10 "var_arg_problem.c" 3 4
 )
# 10 "var_arg_problem.c"
                ;
  for (i=0;i<n;i++)
  {
    val =
# 13 "var_arg_problem.c" 3 4
         __builtin_va_arg(
# 13 "var_arg_problem.c"
         vl
# 13 "var_arg_problem.c" 3 4
         ,
# 13 "var_arg_problem.c"
         long long
# 13 "var_arg_problem.c" 3 4
         )
# 13 "var_arg_problem.c"
                              ;
    printf ("0x%llx\n", val);
  }
# 16 "var_arg_problem.c" 3 4
 __builtin_va_end(
# 16 "var_arg_problem.c"
 vl
# 16 "var_arg_problem.c" 3 4
 )
# 16 "var_arg_problem.c"
           ;
}

unsigned long long x = 0x0123456789abcdef;
unsigned long long y = 0xfedcba9876543210;

void Trampoline()
{
    asm("addi sp, sp, -4");
    PrintLongLong(2, y, x);
    asm("addi sp, sp, 4");
}

int main()
{
        Trampoline();
 return 0;
}
aswaterman commented 7 years ago

The stack pointer should always be 16-byte aligned. Any idea how it is losing 16-byte alignment?

s-macke commented 7 years ago

My conditional breakpoints tell me, that it first happens in bbl at https://github.com/riscv/riscv-pk/blob/priv-1.10/machine/mentry.S#L254 in la sp, stacks + RISCV_PGSIZE - MENTRY_FRAME_SIZE which corresponds to these two lines:

auipc   sp,0xe
addi    sp,sp,-820 # 8000df40 <stacks+0xf40>

In the kernel it happens the first time at

c01edd30 <__switch_to>:
c01edd30:       34152c23                sw      ra,856(a0)
c01edd34:       34252e23                sw      sp,860(a0)
c01edd38:       36852023                sw      s0,864(a0)
c01edd3c:       36952223                sw      s1,868(a0)
c01edd40:       37252423                sw      s2,872(a0)
c01edd44:       37352623                sw      s3,876(a0)
c01edd48:       37452823                sw      s4,880(a0)
c01edd4c:       37552a23                sw      s5,884(a0)
c01edd50:       37652c23                sw      s6,888(a0)
c01edd54:       37752e23                sw      s7,892(a0)
c01edd58:       39852023                sw      s8,896(a0)
c01edd5c:       39952223                sw      s9,900(a0)
c01edd60:       39a52423                sw      s10,904(a0)
c01edd64:       39b52623                sw      s11,908(a0)
c01edd68:       3585a083                lw      ra,856(a1)
c01edd6c:       35c5a103                lw      sp,860(a1)  <- here
c01edd70:       3605a403                lw      s0,864(a1)
c01edd74:       3645a483                lw      s1,868(a1)
c01edd78:       3685a903                lw      s2,872(a1)
c01edd7c:       36c5a983                lw      s3,876(a1)
c01edd80:       3705aa03                lw      s4,880(a1)
c01edd84:       3745aa83                lw      s5,884(a1)
c01edd88:       3785ab03                lw      s6,888(a1)
c01edd8c:       37c5ab83                lw      s7,892(a1)
c01edd90:       3805ac03                lw      s8,896(a1)
c01edd94:       3845ac83                lw      s9,900(a1)
c01edd98:       3885ad03                lw      s10,904(a1)
c01edd9c:       38c5ad83                lw      s11,908(a1)
c01edda0:       0045a203                lw      tp,4(a1)
c01edda4:       00008067                ret

Of course I can look further to find the reason for the alignment glitch. It seems, that people working with the 32-Bit toolchain are very rare.

aswaterman commented 7 years ago

On Mon, May 1, 2017 at 12:09 PM Sebastian Macke notifications@github.com wrote:

My conditional breakpoints tell me, that it first happens in bbl at https://github.com/riscv/riscv-pk/blob/priv-1.10/machine/mentry.S#L254 in la sp, stacks + RISCV_PGSIZE - MENTRY_FRAME_SIZE which corresponds to these two lines:

auipc sp,0xe addi sp,sp,-820 # 8000df40 <stacks+0xf40>

That looks correctly aligned.

In the kernel it happens the first time at

c01edd30 <__switch_to>: c01edd30: 34152c23 sw ra,856(a0) c01edd34: 34252e23 sw sp,860(a0) c01edd38: 36852023 sw s0,864(a0) c01edd3c: 36952223 sw s1,868(a0) c01edd40: 37252423 sw s2,872(a0) c01edd44: 37352623 sw s3,876(a0) c01edd48: 37452823 sw s4,880(a0) c01edd4c: 37552a23 sw s5,884(a0) c01edd50: 37652c23 sw s6,888(a0) c01edd54: 37752e23 sw s7,892(a0) c01edd58: 39852023 sw s8,896(a0) c01edd5c: 39952223 sw s9,900(a0) c01edd60: 39a52423 sw s10,904(a0) c01edd64: 39b52623 sw s11,908(a0) c01edd68: 3585a083 lw ra,856(a1) c01edd6c: 35c5a103 lw sp,860(a1) <- here

Looks like the kernel is creating a misaligned stack frame. If you can trace that back to its source, we would be grateful.

c01edd70: 3605a403 lw s0,864(a1) c01edd74: 3645a483 lw s1,868(a1) c01edd78: 3685a903 lw s2,872(a1) c01edd7c: 36c5a983 lw s3,876(a1) c01edd80: 3705aa03 lw s4,880(a1) c01edd84: 3745aa83 lw s5,884(a1) c01edd88: 3785ab03 lw s6,888(a1) c01edd8c: 37c5ab83 lw s7,892(a1) c01edd90: 3805ac03 lw s8,896(a1) c01edd94: 3845ac83 lw s9,900(a1) c01edd98: 3885ad03 lw s10,904(a1) c01edd9c: 38c5ad83 lw s11,908(a1) c01edda0: 0045a203 lw tp,4(a1) c01edda4: 00008067 ret

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-gcc/issues/63#issuecomment-298405451, or mute the thread https://github.com/notifications/unsubscribe-auth/AA-7wvI7wDP047HjoJNLzghnJTsbqesPks5r1i33gaJpZM4NNKOs .

s-macke commented 7 years ago

My best guess is, that the stack pointer is set in copy_thread: https://github.com/riscv/riscv-linux/blob/priv-1.10/arch/riscv/kernel/process.c#L104

The pointer is calculated via https://github.com/riscv/riscv-linux/blob/priv-1.10/arch/riscv/include/asm/processor.h#L48

The size of the struct pt_regs is 140 bytes. And the "-1" subtracts this value from the new stack pointer. But 140 is not divisible by 16.

Again, this is just my current guess. I have to dig in deeper. But this is a good place to start.

aswaterman commented 7 years ago

I believe you're right. I made this patch to the riscv-next branch of riscv-linux, but I'm not sure if there are other things broken with RV32 on that branch, so you might need to cherry-pick it. https://github.com/riscv/riscv-linux/commit/52aa3ecc9b3e5788e1b862f5aa89cbd1784ef53b

s-macke commented 7 years ago

Thanks, that solved the problem. I compiled the riscv-next branch and it runs out of the box without further patching in my emulator. Also my virtio/9p filesystem device with PLIC interrupt controller runs finally. Here is my dts file just for reference: https://gist.github.com/s-macke/d96a147ed33f451b1f3d9a00bdab7ce0

I think that 32-Bit Linux will also run in spike with a few patches when you allow to use the uart instead of htif for the console.

My conditional breakpoint still finds stack pointer register content which are not aligned (one thousand cycles of about a hundred million). But it looks like this happens only in the assembler routines for a few cycles.

When I find another problem with the stack pointer in Linux I open another issue.

aswaterman commented 7 years ago

Ah, that's good news!

sorear commented 7 years ago

@s-macke this for jor1k ?

s-macke commented 7 years ago

Yeah, this is the plan. I have already a RISC-V emulator from the 2015 GSoC project but with an ancient Privilege Spec: http://s-macke.github.io/jor1k/demos/riscv.html I promised two years ago to continue this emulator until mmio devices are really supported. Two month ago I spoke with Krste Asanovic on a meeting in Munich and mentioned that this main feature is still missing. Note that the 64-Bit emulator by Fabrice Bellard had to patch the kernel massively to make it workable.

Well, a few weeks after I spoke with Krste the Linux kernel finally got the PLIC, which was the final missing piece. Now I am catching up with the privilege spec which actually means to rewrite 30-40% of the CPU emulator. See: https://github.com/s-macke/jor1k/tree/riscv-new-priv

I still don't know about my plans with this emulator. It is just fun and there are so many ideas. I want to finally run Java for example and a computing cluster. RISC-V is an exciting architecture and it seems to get good support by the community right now.

sorear commented 7 years ago

@s-macke jor1k is one of the first things I looked at when I was getting into risc-v 2016Q1. Having risc-v jor1k anywhere near feature parity w/ openrisc would be amazing and very helpful for outreach, especially if we can get it in as a replacement for https://riscv.org/software-tools/riscv-angel/ .

If I can arrange for either the Debian or Fedora bootstraps to do rv64 and rv32 simultaneously, would there be a possibility of using that?

s-macke commented 7 years ago

Now that we have MMIO support the RISC-V implementation will have feature parity with OpenRISC from the hardware emulation side - That means 32-Bit RISC-V but with more devices based on virtio. I don't plan to support 64-Bit at the moment as it requires a lot of changes in the code and additional Webassembly code for speed.

The distribution is up for discussion. Like for OpenRISC I plan to compile my own light weight distribution of around of 200MB with the typical packages which also fit in a Github repository. For Fedora and Debian, that space on Github probably won't be enough and I would probably stretch my free account to the allowed limits :) So what I need is just a compressed .tar file with the 32-Bit compiled root filesystem of Debian or Fedora. And to publish these distributions I probably need some server space.