llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.47k stars 11.77k forks source link

Assembly instructions with "OFFSET FLAT:" are handled incorrectly #38795

Open llvmbot opened 5 years ago

llvmbot commented 5 years ago
Bugzilla Link 39447
Version 7.0
OS Windows NT
Attachments Contains CM_Switch.cpp CM_Switch.cod CM_Switch.obj
Reporter LLVM Bugzilla Contributor
CC @AndyAyersMS,@compnerd,@rnk,@smeenai

Extended Description

Hello LLVM-Team,

I used the new LLVM 7 to write a small and simple JIT-Client, which loads bitcode files, JITs them and executes them. In this JIT process I also include some object files which were generated by VisualStudio2017 - but sadly the resulting code will crash. I did some research and try to explain what I've done and what my conclusions are.

1.) Generating VisualStudio object file All I do is simply compile the file "CM_Switch.cpp" - as it is attached to this report - and that's all. I use the following compile flags: /nologo /FAcs /Zc:wchar_t- /GS- /MT /W3 /O2 /I "....\include" /I "....\external\include" /D "WIN32" /D "_CRT_NON_CONFORMING_SWPRINTFS" /D "_CRT_NONSTDC_NO_DEPRECATE" /D "_CRT_SECURE_NO_WARNINGS" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /Fp"$(OutDir)%(Filename).pch" /Fo"$(OutDir)%(Filename).obj" /c $(ProjectName).cpp

2.) JIT Client For the JIT client I use to parse first a bc file, that does not contain any code - I just compiled an empty .cpp document with clang and enabled generating a bc file. So the bc file is not empty, but has no executable code or anything. After this I locate the CM_Switch.obj file and add it via "addObjectFile": llvm::Expected<std::unique_ptr> preObj = llvm::object::ObjectFile::createObjectFile(ArBuf.get()->getMemBufferRef()); refEngine->addObjectFile(llvm::object::OwningBinary(std::move(preObj.get()), std::move(ArBuf.get())));

When generating the executable code, the JIT client will ask for resolving some references and will get these address as they are. But executing the "Initialize2" function will crash the application.

Investigations: With the CM_Switch.cod file and a debugger I was able to locate the root of the problem! Assembly instructions like these: lea r8, OFFSET FLAT:__ImageBase

The problem comes from "OFFSET FLAT" which - as I understood - determine the offset of the current instruction to that reference. In this case "ImageBase". But this is not handled correct! When I pass an address to "ImageBase", the application will crash at EXACTLY the address I passed. When I return 0xFF as an Address, I will crash at the address 0xFF, if I pass the address of ImageBase, I will crash there. If I pass an address to a function, then this function will actually be executed. It seems to me, that this code gets replaced with a jump, which is totally wrong.

That is all I can say.

llvmbot commented 5 years ago

I think some of the steps Andy is describing need to implemented in LLVM's JIT code. I don't think you can take them all as a user. I don't think anyone is actively working on making the JIT handle COFF right now, unfortunately.

So... All I can do is wait? :0 I'm not expierenced enough to do the changes myself - as mentioned I'm just a user of the LLVM. Currently we work around this issue by using the clang compiler but this will not help with object or libary files that are not from us.

rnk commented 5 years ago

Is there a way to manages this with the LLVM as an 'user' - or do I have to change the source code of the LLVM?

I think some of the steps Andy is describing need to implemented in LLVM's JIT code. I don't think you can take them all as a user. I don't think anyone is actively working on making the JIT handle COFF right now, unfortunately.

AndyAyersMS commented 5 years ago

Unfortunately it's been a quite while since I tried this, so I don't remember the details anymore. I recall making some changes in windows relocation processing, so if you look for those you may get some clues.

llvmbot commented 5 years ago

Is there a way to manages this with the LLVM as an 'user' - or do I have to change the source code of the LLVM?

AndyAyersMS commented 5 years ago

_ImageBase is set to the base address of the executable, so that (among other things) jump table offsets can be encoded compactly using addr32nb relocations.

EG if you dump the relocations in your text section you see:

RELOCATIONS #​6 Symbol Symbol Offset Type Applied To Index Name


0000000C REL32 00000000 8 ?myInt@@3HA (int myInt) 00000013 REL32 00000000 59 __ImageBase 00000026 REL32 00000000 55 ??_C@_0L@KCGKBKCO@?$CFllu?4?$CJ?5?$CFi?6?$AA@ (`string') 00000031 REL32 00000000 1C printf 00000038 REL32 00000000 8 ?myInt@@3HA (int myInt) 00000045 ADDR32NB 00000000 2C $LN22 0000004C ADDR32NB 00000000 2D $LN23 0000008A REL32 00000000 8 ?myInt@@3HA (int myInt) 00000090 REL32_1 00000000 B ?Initialized@@3_NA (bool Initialized) 000000A8 ADDR32NB 00000000 2E $LN9 000000AC ADDR32NB 00000000 2F $LN10 000000B0 ADDR32NB 00000000 30 $LN11 000000B4 ADDR32NB 00000000 31 $LN12 000000B8 ADDR32NB 00000000 32 $LN13 000000BC ADDR32NB 00000000 33 $LN14 000000C0 ADDR32NB 00000000 34 $LN15 000000C4 ADDR32NB 00000000 35 $LN16

Here the _ImageBase and ADDR32NB relocations must be resolved in a consistent fashion, so that a sequence like the following works:

lea rdi, OFFSET FLAT:__ImageBase movzx ecx, BYTE PTR $LN22@Initialize[rdi+rax] ; ADDR32NB mov edx, DWORD PTR $LN23@Initialize[rdi+rcx*4] ; ADDR32NB add rdx, rdi jmp rdx

Since you're not building an executable you need to emulate handling these sorts of executable-related relocations.

For example: ensure that all the constituent loadable parts of the object are placed within a 4GB range (as they would be if they were part of an executable). Then resolve _ImageBase to the address of the lowest loaded part, and resolve each ADDR32NB as the delta between the relocation target and _ImageBase.

llvmbot commented 5 years ago

I know there are people who work on Windows/COFF, and I know there are people who work on MCJIT, but I don't know if there's anyone who works on both. Adding a couple of random people though just in case someone else does happen to know.