llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.95k stars 11.04k forks source link

[Arm Windows][JIT] Diverged behavior of Jitted code on Arm Windows (Surface Laptop 7) #96435

Open Jason821-prog opened 3 weeks ago

Jason821-prog commented 3 weeks ago

Hello

I'm running into an issue that could indicate a potential hidden bug in LLVM JIT on Arm Windows.

A little background about how I found this issue. I worked on a programming language a while ago, which is supported on x64 Win, x64 Linux and Mac (both x64 and Arm64). With the release of the new surface laptop, I made an attempt to port it to Arm Windows. It worked mostly fine except one failed unit test, which can be boiled down to the following problem. (To clarify, the same unit test with exactly the same source code won't fail on all other supported platforms). I tried a few versions of LLVM, including 10.0.0, 12.0.0 and 17.0.1, none of them works as expected on Arm Windows, unfortunately.

Here is the IR produced by my own custom language.

; ModuleID = 'my cool jit'
source_filename = "my cool jit"

@a = internal constant [8 x float] [float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00, float 5.000000e+00, float 6.000000e+00, float 7.000000e+00]

declare float @access_global_array(ptr)

define float @my_proxy_function() {
entry:
  %0 = load float, ptr getelementptr (float, ptr @a, i32 5), align 4
  ret float %0
}

As simple as it is, this does nothing fancy but declaring a global variable (float array of 8) and access one of them in a function call. However, this will return an incorrect value, unfortunately. It returns 1, rather than 5.

So I spend some time mess around with the IR code and found out that if I call an externally defined function (defined in c) in my LLVM produced code, it works as expected by returning 5. Here is the new IR with the externally defined function. Below are the newly produced LLVM IR. FYI, the externally defined function is an empty function that does nothing but return a value. which is not even caught by the produced code.

; ModuleID = 'my cool jit'
source_filename = "my cool jit"

@a = internal constant [8 x float] [float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00, float 5.000000e+00, float 6.000000e+00, float 7.000000e+00]

declare float @access_global_array(ptr)

define float @my_proxy_function() {
entry:
  %calltmp = call float @access_global_array(ptr @a)
  %0 = load float, ptr getelementptr (float, ptr @a, i32 5), align 4
  ret float %0
}

Then I build a debug version LLVM and found the program hits a failed assert, which already indicates something is wrong in the middle of the program. Here is a screenshot of my visual studio hitting that failed assert.

image

If someone could take a look, it is greatly appreciated.

Thanks.

llvmbot commented 3 weeks ago

@llvm/issue-subscribers-orcjit

Author: None (Jason821-prog)

Hello I'm running into an issue that could indicate a potential hidden bug in LLVM JIT on Arm Windows. A little background about how I found this issue. I worked on a programming language a while ago, which is supported on x64 Win, x64 Linux and Mac (both x64 and Arm64). With the release of the new surface laptop, I made an attempt to port it to Arm Windows. It worked mostly fine except one failed unit test, which can be boiled down to the following problem. (To clarify, the same unit test with exactly the same source code won't fail on all other supported platforms). I tried a few versions of LLVM, including 10.0.0, 12.0.0 and 17.0.1, none of them works as expected on Arm Windows, unfortunately. Here is the IR produced by my own custom language. ``` ; ModuleID = 'my cool jit' source_filename = "my cool jit" @a = internal constant [8 x float] [float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00, float 5.000000e+00, float 6.000000e+00, float 7.000000e+00] declare float @access_global_array(ptr) define float @my_proxy_function() { entry: %0 = load float, ptr getelementptr (float, ptr @a, i32 5), align 4 ret float %0 } ``` As simple as it is, this does nothing fancy but declaring a global variable (float array of 8) and access one of them in a function call. However, this will return an incorrect value, unfortunately. It returns 1, rather than 5. So I spend some time mess around with the IR code and found out that if I call an externally defined function (defined in c) in my LLVM produced code, it works as expected by returning 5. Here is the new IR with the externally defined function. Below are the newly produced LLVM IR. FYI, the externally defined function is an empty function that does nothing but return a value. which is not even caught by the produced code. ``` ; ModuleID = 'my cool jit' source_filename = "my cool jit" @a = internal constant [8 x float] [float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00, float 5.000000e+00, float 6.000000e+00, float 7.000000e+00] declare float @access_global_array(ptr) define float @my_proxy_function() { entry: %calltmp = call float @access_global_array(ptr @a) %0 = load float, ptr getelementptr (float, ptr @a, i32 5), align 4 ret float %0 } ``` Then I build a debug version LLVM and found the program hits a failed assert, which already indicates something is wrong in the middle of the program. Here is a screenshot of my visual studio hitting that failed assert. <img width="957" alt="image" src="https://github.com/llvm/llvm-project/assets/173625219/4a306de1-df95-4009-892c-9d35bcb97e15"> If someone could take a look, it is greatly appreciated. Thanks.
llvmbot commented 3 weeks ago

@llvm/issue-subscribers-backend-aarch64

Author: None (Jason821-prog)

Hello I'm running into an issue that could indicate a potential hidden bug in LLVM JIT on Arm Windows. A little background about how I found this issue. I worked on a programming language a while ago, which is supported on x64 Win, x64 Linux and Mac (both x64 and Arm64). With the release of the new surface laptop, I made an attempt to port it to Arm Windows. It worked mostly fine except one failed unit test, which can be boiled down to the following problem. (To clarify, the same unit test with exactly the same source code won't fail on all other supported platforms). I tried a few versions of LLVM, including 10.0.0, 12.0.0 and 17.0.1, none of them works as expected on Arm Windows, unfortunately. Here is the IR produced by my own custom language. ``` ; ModuleID = 'my cool jit' source_filename = "my cool jit" @a = internal constant [8 x float] [float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00, float 5.000000e+00, float 6.000000e+00, float 7.000000e+00] declare float @access_global_array(ptr) define float @my_proxy_function() { entry: %0 = load float, ptr getelementptr (float, ptr @a, i32 5), align 4 ret float %0 } ``` As simple as it is, this does nothing fancy but declaring a global variable (float array of 8) and access one of them in a function call. However, this will return an incorrect value, unfortunately. It returns 1, rather than 5. So I spend some time mess around with the IR code and found out that if I call an externally defined function (defined in c) in my LLVM produced code, it works as expected by returning 5. Here is the new IR with the externally defined function. Below are the newly produced LLVM IR. FYI, the externally defined function is an empty function that does nothing but return a value. which is not even caught by the produced code. ``` ; ModuleID = 'my cool jit' source_filename = "my cool jit" @a = internal constant [8 x float] [float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00, float 4.000000e+00, float 5.000000e+00, float 6.000000e+00, float 7.000000e+00] declare float @access_global_array(ptr) define float @my_proxy_function() { entry: %calltmp = call float @access_global_array(ptr @a) %0 = load float, ptr getelementptr (float, ptr @a, i32 5), align 4 ret float %0 } ``` Then I build a debug version LLVM and found the program hits a failed assert, which already indicates something is wrong in the middle of the program. Here is a screenshot of my visual studio hitting that failed assert. <img width="957" alt="image" src="https://github.com/llvm/llvm-project/assets/173625219/4a306de1-df95-4009-892c-9d35bcb97e15"> If someone could take a look, it is greatly appreciated. Thanks.
efriedma-quic commented 2 weeks ago

This is asserting that referenced address (i.e. the global variable) is appropriately aligned. What's the address of the target? Is it possible your memory allocator is producing misaligned memory?

Jason821-prog commented 2 weeks ago

Ths is a legit question. However, I did check this before in the debug build.

In the first post, there is a screenshot which reveals a failed assert with a debug llvm build. And there is some basic debug signal there.

So I run it again just now to make sure this address is 4 bytes aligned.

image

As we can see from this screenshot, the address of the memory is 0x000002a550350000. I intentionally initialize the array with a weird value to avoid chances some other memory ends up with the same values. As we can see, this address is 4 bytes aligned and the values inside the array pointed by this address is indeed a multiple of 123.0f, which is what the source code indicates as well.

To answer your question, I can't gurrantee misaligned memory won't happen with this simple experiment. But what I see here is that with the test, I managed to fails the assert with aligned memory, which also indicates something wrong in the system. And this very likely indicates the same bug.

Jason821-prog commented 2 weeks ago

However, I'd like to point out that the value of Value on line 291 WILL be mis-aligned as the value of RE.Addend is 5. Since the address of the address is already 4 bytes aligned, some interal data structure causes mis-aligned address seems indicating a potential bug in the system.

If there is anything that I can do to prevent it in my program, I would be happy to learn it.