llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.63k stars 11.83k forks source link

Clang as an x64 assembler - Masm x64 #99

Open UnlimitedChild opened 4 years ago

UnlimitedChild commented 4 years ago

Hello,

why Clang does not support the basic directive - "offset". This topic has been discussed many times https://lists.llvm.org/pipermail/llvm-bugs/2017-April/054674.html https://reviews.llvm.org/D37461 https://stackoverflow.com/questions/43223287/why-does-this-simple-assembly-program-work-in-att-syntax-but-not-intel-syntax https://bugs.llvm.org/show_bug.cgi?id=22511

In general, Сlang does not support many assembler properties, for example, the ability to represent numbers is limited. Should we expect more complete support for assembler dialects in the future?

We can’t live without the offset directive ..

// 
// RUN: clang -mllvm --x86-asm-syntax=intel -c hello.asm
// RUN: clang hello.o -o hello.exe

//.intel_syntax
.intel_syntax noprefix

.data
#define Data_ADDR offset .ascii
Key:
    .ascii "Hello, world!\n"
    .set mylen, .-Key

.text
.global start
foo:
mov r15 , 555  
MOV r14 , 555
MOV r13 , 555  
MOV r12 , 555 
MOV r11 , 555 
MOV r10 , 555 
MOV r9 , 555 
MOV r8 , 555 
MOV rbp , 555  
MOV rax , 555 
MOV rdi , 555 
MOV rsi , 555  
MOV rdx , 555 
MOV rcx , 6555  
MOV rbx , 555  
MOV rax , 555  
nop
nop
nop
MOV rax , foo
MOV rax , [foo]
lea rax , foo
lea rax , [foo]
nop
MOV rax , .ascii
MOV rax , [.ascii]
lea rax , .ascii
lea rax , [.ascii]
nop
MOV rax , offset Key
mov rax , Data_ADDR

// MOV rax , offset foo
// MOV rax , offset [foo]
// lea rax , offset foo
// lea rax , offset [foo]
// MOV rax , addr foo
// MOV rax , addr [foo]
// lea rax , addr foo
// lea rax , addr [foo]

Thanks for the work!

JonChesterfield commented 4 years ago

Last time I looked there was a different feature set working for at&t, so there's an outside chance that changing dialect would be a workaround.

ericastor commented 4 years ago

I've recently improved handling of "offset" at HEAD in Intel-syntax assembly. I believe the changes involved may have made it into clang-10, but am not 100% certain. See https://reviews.llvm.org/D71436.

On Sun, Jan 19, 2020 at 8:09 AM Jon Chesterfield notifications@github.com wrote:

Last time I looked there was a different feature set working for at&t, so there's an outside chance that changing dialect would be a workaround.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/llvm/llvm-project/issues/99?email_source=notifications&email_token=AAH25N646OWNHEORJFP4GNLQ6RGJTA5CNFSM4KIY4RYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKRXWI#issuecomment-576003033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH25NZURZ2O6PZ75VY3B6DQ6RGJTANCNFSM4KIY4RYA .

ericastor commented 4 years ago

As for other assembly dialects - if you can file specific issues here, or as bugs at http://bugs.llvm.org, people may get to them eventually. The more specific you can be - reduced test cases that express the precise issue, explaining the difference between current results and expected results, etc. - the easier it is for someone to pick it up and fix it.

On Sun, Jan 19, 2020 at 9:47 AM Eric Astor epastor@google.com wrote:

I've recently improved handling of "offset" at HEAD in Intel-syntax assembly. I believe the changes involved may have made it into clang-10, but am not 100% certain. See https://reviews.llvm.org/D71436.

On Sun, Jan 19, 2020 at 8:09 AM Jon Chesterfield notifications@github.com wrote:

Last time I looked there was a different feature set working for at&t, so there's an outside chance that changing dialect would be a workaround.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/llvm/llvm-project/issues/99?email_source=notifications&email_token=AAH25N646OWNHEORJFP4GNLQ6RGJTA5CNFSM4KIY4RYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKRXWI#issuecomment-576003033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH25NZURZ2O6PZ75VY3B6DQ6RGJTANCNFSM4KIY4RYA .

UnlimitedChild commented 4 years ago

ericastor Thank you very much for the detailed information!

Some functions that worked before do not work today. echo "mov eax, eax" | llvm-mc -x86-asm-syntax=intel

`C:\msys64\mingw64\bin>echo "mov eax, eax" | llvm-mc -x86-asm-syntax=intel .text

:1:1: error: invalid instruction mnemonic 'mov eax, eax' "mov eax, eax" ^~~~~~~~~~~~ C:\msys64\mingw64\bin>`
ericastor commented 4 years ago

@UnlimitedChild I can't replicate your "mov eax, eax" problem locally on my Linux machine, at least. Can you confirm what version of llvm-mc you're using? Running llvm-mc --version should give all the information we need.

ericastor commented 4 years ago

Oh, and please note that clang does not currently compile MASM-syntax assembly directives, etc. - it mostly accepts Intel syntax (omitted size suffixes, dst, src operand ordering), but work is still being done as more bugs are found. As mentioned, AT&T syntax is better-supported at the moment.

I'm working on a MASM-compatible LLVM-based assembler (llvm-ml) in my spare time, but am not sure how long that will take. (llvm-dev posts so far: RFC, and a few other threads)

UnlimitedChild commented 4 years ago

@UnlimitedChild I can't replicate your "mov eax, eax" problem locally on my Linux machine, at least. Can you confirm what version of llvm-mc you're using? Running llvm-mc --version should give all the information we need.

C:\msys64\mingw64\bin>llvm-mc -version LLVM (http://llvm.org/): LLVM version 9.0.0 Optimized build. Default target: x86_64-w64-windows-gnu

UnlimitedChild commented 4 years ago

Oh, and please note that clang does not currently compile MASM-syntax assembly directives, etc. - it mostly accepts Intel syntax (omitted size suffixes, dst, src operand ordering), but work is still being done as more bugs are found. As mentioned, AT&T syntax is better-supported at the moment.

Thank you, everything is clear =)

I'm working on a MASM-compatible LLVM-based assembler (llvm-ml) in my spare time, but am not sure how long that will take. (llvm-dev posts so far: RFC, and a few other threads)

Great news, I think that there will be no problems with testing. A sufficient number of people are interested in this opportunity. I already found a development branch for macro assembler https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-ml/llvm-ml.cpp

It's hard to say where the syntax integration should go, there is a very good project that implements support for macro assembler (x32-x64) - https://github.com/Terraspace/UASM, In the project you can see the entire implementation of macro assembler, see the internal logic ... in addition, some documents will be useful and informative - Microsoft Corporation - Microsoft Macro Assembler 6.11 Reference Manual_Environment and Tools (1992, Microsoft Corporation).pdf Microsoft Corporation - Microsoft Macro Assembler 6.11 Reference Manual_Getting Started (1992, Microsoft Corporation).pdf Microsoft Corporation - Microsoft Macro Assembler 6.11 Reference Manual_Macro Assembler Reference (1992, Microsoft Corporation).pdf Microsoft Corporation - Microsoft Macro Assembler 6.11 Reference Manual_Programmers Guide (1992, Microsoft Corporation).pdf

This is the most comprehensive macro assembler documentation. The rest of the description can be found here - https://docs.microsoft.com/en-us/cpp/assembler/masm/microsoft-macro-assembler-reference?view=vs-2017

Is there some kind of UML diagram to describe the expected llvm-ml assembly process?

ericastor commented 4 years ago

@UnlimitedChild That's the documentation I'm referencing, and the "llvm-ml.cpp" you found is in fact the development placeholder I'm working on. I'd certainly like not to be the only one working on it! I'm currently working to land the first commit of substance, though that may take a bit due to it not being my 100% focus while at work.

Please note that UASM (and every other open-source MASM project I've found) is released under a license that is generally considered incompatible with LLVM's license, so we cannot reference their code while building this.

As for a UML diagram - I'm not sure what you mean. At early stages of this project, we will be testing MASM support by building it through llvm-ml. At later stages (once it's proven to mostly work), it should be possible for clang-cl to build MASM files directly (just as with .s files on other platforms), by defaulting to MASM support when compiling files with the extension ".asm".

UnlimitedChild commented 4 years ago

A few questions:

  1. Is there anyone around who would be willing to answer questions regarding the intended architecture of llvm-mc and the AsmParser classes? I'd like to make sure my proposals fit well into the design... and I'm starting to have trouble finding where these extensions should go. (Also, I've had some trouble getting used to the recursive-descent parser conventions being used. For example, how should one handle "try parsing this identifier as a register, and if that fails, check if it's defined as a symbol" while not emitting Errors from the first attempt?)

This is what I had in mind, which means it is still under discussion.

Please note that UASM (and every other open-source MASM project I've found) is released under a license that is generally considered incompatible with LLVM's license, so we cannot reference their code while building this.

Indeed, did not think about it.

UnlimitedChild commented 4 years ago

C++ is actually a macro language with inline functions, so macro assembler should use similar processing logic, where the architectural implementation may be similar. Instead of C++ templates, macro operations will be processed.

llvmbot commented 2 years ago

@llvm/issue-subscribers-clang-frontend