Closed statham-arm closed 3 days ago
@sneves any idea if these might be easy fixes upstream?
I guess so. Unless the defaults in some version of MASM are whack, removing the stuff after SEGMENT
should be fine. Removing the _function_name
procs should be fine, too. And relabelling the anonymous labels @@
into some name is OK, of course.
That being said, llvm-ml cannot currently be described as a MASM replacement, looking at the current state of https://github.com/llvm/llvm-project/blob/main/llvm/lib/MC/MCParser/COFFMasmParser.cpp
CC @ericastor
llvm-ml
is definitely not "100% complete" yet. For instance (as highlighted here):
@@
is not yet implementedSEGMENT
's additional parameters are not yet supported.As for nesting PROC/ENDP pairs... I'm surprised to learn ml64.exe
supports that, but of course we can add that as a feature.
As with all aspects of open-source development, we welcome additional contributions!
FWIW, other options for building LLVM entirely self-hosted without relying on ml64.exe
would be to disable the blake3 assembly (you should be able to set -DLLVM_DISABLE_ASSEMBLY_FILES=ON
when configuring the build), or coerce it to use the gnu style .S
assembly even in clang-cl mode, somehow (since clang can assemble that just fine).
Yes, I found LLVM_DISABLE_ASSEMBLY_FILES
, and that's how I worked around this. I did wonder about using the GNU-style files, but didn't fancy trying to figure out how to make the cmake system use them in what's otherwise a Windows-style compilation.
I hadn't quite realised that llvm-ml was in an unfinished state; if I had, I might not have bothered raising this bug. I assumed it was complete enough that not being able to handle LLVM's own sources would be a surprise. Sorry about that!
Please don't apologize for raising the issue... that's how we'll get these all identified & fixed! (I wasn't even aware of the blake3 assembly files somehow, and getting that working is a good next target.)
I hadn't quite realised that llvm-ml was in an unfinished state; if I had, I might not have bothered raising this bug. I assumed it was complete enough that not being able to handle LLVM's own sources would be a surprise. Sorry about that!
Well "LLVM's own sources" is also a rather relative concept, as the blake3 assembly files were imported from a third party pretty recently. Before that, llvm-ml did pass the bar of being able to build LLVM's MASM OpenMP assembly code though!
And indeed, when the tool is complete enough for some use cases, getting to know the other use cases that still don't work is valuable - that's more meaningful to work on, rather than sitting down with a plain list of unimplemented features, just for completeness.
Current statue of this issue is that in -m64 mode, vmovaps xmm2, xmmword ptr [BLAKE3_IV]
become vmovaps 0x0, %xmm2
, while ml64 emits vmovaps (%rip), %xmm2
.
It seems that default memory base register is not rip for avx2 instructions.
How to check if the code compiled is functional? Because llvm-ml does compile Blake in llvm, though I'm not sure how to check if it works. Edit: ran check-all on the build and got tons of failures before even getting 10%. Guess that shows it's not there yet.
@llvm/issue-subscribers-backend-x86
Author: Simon Tatham (statham-arm)
The LLVM project contains
llvm-ml
, a MASM-syntax assembler intended to behave likeml.exe
orml64.exe
. As of https://reviews.llvm.org/D121510, it also contains some MASM-syntax assembly source files inllvm/lib/Support/BLAKE3
. But the former cannot accept the latter as input:My own partial diagnosis of the problems, just from looking at the source file:
It looks as if declaring
@@
as a label multiple times is expected to be allowed, withjmp @b
orjmp @f
going to the previous or next instance of it respectively. Butllvm-ml
is expecting it to be like any other label, defined at most once.The 'endp outside of procedure block' errors appear to be arising from nesting of two
PROC
/ENDP
pairs around the same code in order to define two procedures with the same address and extent (foo PROC
,bar PROC
, code code code,bar ENDP
,foo ENDP
).llvm-ml
is complaining on the secondENDP
, suggesting that the realml64.exe
is able to track multiple openPROC
definitions andllvm-ml
isn't.As for the first error in the segment directive: it looks to me as if
llvm-ml
has terminated parsing after the wordSEGMENT
, and attempted to parse the rest of the line independently, as if there had been a newline in between. (For example, I can write_TEXT SEGMENT mov rax,1
without provoking an error.) So the suffixALIGN(16)
is being regarded as a standalone alignment directive, rather than an attribute on the segment directive, and that's why the error message about'CODE'
is reporting an unexpected token in the align directive and not in the segment directive.(This is all as of git commit fc8f465a0008826cb7431eb5684861477998662c, with llvm-ml built from the sources in that commit trying to assemble the BLAKE3 code in the same commit.)