Closed pupiles closed 2 years ago
@pupiles, I tried compiling the LLVM IR example you provided using Clang (13.0.0), but get the same error:
u@x1 /t/foo [1]> clang -o foo foo.ll
foo.ll:1437:287: error: expected '('
invoke void (%"class.std::__cxx11::basic_string"*, i32 (i8*, i64, i8*, %struct.__va_list_tag*)*, i64, i8*, ...) @_ZN9__gnu_cxx12__to_xstringINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEcEET_PFiPT0_mPKS8_P13__va_list_tagEmSB_z(%"class.std::__cxx11::basic_string"* nonnull sret align 8 %6, i32 (i8*, i64, i8*, %struct.__va_list_tag*)* nonnull @vsnprintf, i64 32, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.39, i64 0, i64 0), i64 %137)
^
1 error generated.
So, it seems the official LLVM tools are not able to parse the gdpr_handler.cpp.o
LLVM IR file. Try generating a new one using Clang, version 13.0.0.
Cheers, Robin
@pupiles btw, do you know what generates this file? It might be a new feature of llvm
@mewmew @dannypsnl, It is generated by clang11, and clang11 can disassemble correctly by llc-11, but I can't parse it using llir either v0.3.3(llvm11) or v0.3.4(llvm12).
@mewmew
func main() {
m := ir.NewModule()
basic_string_t := m.NewTypeDef("class.std::__cxx11::basic_string", types.NewStruct(types.I8))
vsn_printf := m.NewFunc("vsnprintf", types.I32,
ir.NewParam("", types.NewPointer(types.I8)),
ir.NewParam("", types.I64),
ir.NewParam("", types.NewPointer(types.I8)),
)
vsn_printf.Sig.Variadic = true
invokee := m.NewFunc("_ZN9__gnu_cxx12__to_xstringINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEcEET_PFiPT0_mPKS8_P13__va_list_tagEmSB_z",
types.Void,
ir.NewParam("", basic_string_t),
ir.NewParam("", vsn_printf.Typ),
ir.NewParam("", types.I64),
ir.NewParam("", types.NewPointer(types.I8)),
)
mF := m.NewFunc("main", types.I32)
mB := mF.NewBlock("")
f := m.NewGlobal("", basic_string_t)
i := m.NewGlobal("", types.I64)
p := m.NewGlobal("", types.NewPointer(types.I8))
mB.NewInvoke(invokee, []value.Value{f, vsn_printf, i, p}, nil, nil)
println(m.String())
}
A draft start
It seems there are two primary issues with the LLVM IR file that causes the parsing to fail.
Firstly, the sret
parameter attributes (without explicit type) are valid for LLVM 11.0, but not for LLVM 13.0 (see parseRequiredTypeAttr of the official LLVM source code). In LLVM 13.0, an explicit type is needed, e.g.
sret(i8)
This was verified by trying to parse the original gdpr_handler.ll
file using opt -S -o foo_13.ll gdpr_handler.ll
using opt
from LLVM 13.0.
A work-around is simply to remove sret
from the input LLVM IR file.
Secondly, there is a known issue with llir/llvm
where it is unable to parse align
attributes. This is due to a LR(1) shift/reduce ambiguity in the original LLVM IR grammar (as described in #40).
If we remove align
and sret
attributes, then llir/llvm
is able to parse the output produced by opt -S foo_13.ll gdpr_handler.ll
using LLVM 13.0, when using the llvm13 branch of llir/llvm
. Note, support for the DIFlagExportSymbols
enum was added in 4653d58ae05b354c7a4743132cdbe96abbed965d.
Cheers, Robin
@mewmew
Thanks for your reply,
For the sret
parameter attribute, it indicates the return value of the function,so i think it is the important for Data flow analysis,It may not be a good decision to remove directly.If I don’t care about the explicit type, is there any other solution?
For the align \d+
attribute, it only indicates the specified alignment, so it can be remove.
Someone may not care about strict llvm ir. Is it feasible to provide an option switch when lexical parsing encounters align \d+
ambiguity just to ignore them instead of reporting errors?
Thanks for your reply,
You are most welcome :)
For the sret parameter attribute, it indicates the return value of the function,so i think it is the important for Data flow analysis,It may not be a good decision to remove directly.If I don’t care about the explicit type, is there any other solution?
The grammar of LLVM 11.0 supported implicit sret
, but for LLVM 13.0, an explicit type is required. This is true also for the official LLVM distribution.
Someone may not care about strict llvm ir. Is it feasible to provide an option switch when lexical parsing encounters align \d+ ambiguity just to ignore them instead of reporting errors?
That's a good idea. I'm not sure if it is possible, but definitely worth investigating.
Would you care to take a look @pupiles?
The generated lexer and parser are in llir/ll, and the grammar is at llir/grammar. The tool used to generate the lexer and parser is Textmapper. There are some documentation for Textmapper at https://textmapper.org/
Cheers, Robin
It seems there are two primary issues with the LLVM IR file that causes the parsing to fail.
Firstly, the
sret
parameter attributes (without explicit type) are valid for LLVM 11.0, but not for LLVM 13.0 (see parseRequiredTypeAttr of the official LLVM source code). In LLVM 13.0, an explicit type is needed, e.g.sret(i8)
This was verified by trying to parse the original
gdpr_handler.ll
file usingopt -S -o foo_13.ll gdpr_handler.ll
usingopt
from LLVM 13.0.A work-around is simply to remove
sret
from the input LLVM IR file.Secondly, there is a known issue with
llir/llvm
where it is unable to parsealign
attributes. This is due to a LR(1) shift/reduce ambiguity in the original LLVM IR grammar (as described in #40).If we remove
align
andsret
attributes, thenllir/llvm
is able to parse the output produced byopt -S foo_13.ll gdpr_handler.ll
using LLVM 13.0, when using the llvm13 branch ofllir/llvm
. Note, support for theDIFlagExportSymbols
enum was added in 4653d58.Cheers, Robin
Maybe off-topic, but perhaps we take asm parser source code from llvm source code, compile and link with our Go code? The problem I can see is
The benefit I can see is
@mewmew since 13 just get supported, would this get solved?
Given that the llvm13 branch has been merged into master, the work-around mentioned in https://github.com/llir/llvm/issues/212#issuecomment-999690598 should be enough to parse the LLVM IR example source.
The align
ambiguity still remain, but this issue is already tracked by #40. So we can safely close this issue.
Cheers, Robin
P.S. feel free to re-open this issue or a new one if there is a parse error related to LLVM 13.0 or LLVM 14.0.
Hi,
The codes above are ast parsed error when used asm.ParseFile because "class.std::__cxx11::basic_string" seems not supported. Could you pass me some hints on that, really appreciate that. test.cpp.o.zip