Open guoyuqi020 opened 2 months ago
The x86.ll seems to be from brotli.1.0.9 and the riscv.ll is from brotli.1.1.0. The riscv.ll contains a function called BrotliFindAllStaticDictionaryMatchesFor
that does not exist in x86.ll. The %1133 variable you show from riscv.ll is in this function so I can't be sure I can compare the two .ll files.
Sorry, that's my mistake.
I have tried Brotli 1.0.9 on RISCV. The same thing happened.
The basic block on RISCV, brotli 1.0.9
if.else853: ; preds = %while.body
tail call void @llvm.dbg.declare(metadata ptr %is_all_caps, metadata !2182, metadata !DIExpression()), !dbg !2184
%transform854 = getelementptr inbounds %struct.DictWord, ptr %w, i32 0, i32 1, !dbg !2185
%1133 = load i8, ptr %transform854, align 1, !dbg !2185
%conv855 = zext i8 %1133 to i32, !dbg !2185
%cmp856 = icmp ne i32 %conv855, 10, !dbg !2185
%lnot858 = xor i1 %cmp856, true, !dbg !2185
%lnot860 = xor i1 %lnot858, true, !dbg !2185
%1134 = zext i1 %lnot860 to i64, !dbg !2185
%cond = select i1 %lnot860, i32 1, i32 0, !dbg !2185
store i32 %cond, ptr %is_all_caps, align 4, !dbg !2184
tail call void @llvm.dbg.declare(metadata ptr %s862, metadata !2186, metadata !DIExpression()), !dbg !2187
%1135 = load ptr, ptr %dictionary.addr, align 8, !dbg !2188
%words863 = getelementptr inbounds %struct.BrotliEncoderDictionary, ptr %1135, i32 0, i32 0, !dbg !2190
%1136 = load ptr, ptr %words863, align 8, !dbg !2190
%1137 = load ptr, ptr %data.addr, align 8, !dbg !2191
%1138 = load i64, ptr %max_length.addr, align 8, !dbg !2192
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %w.coerce, ptr align 2 %w, i64 4, i1 false), !dbg !2193
%1139 = load i64, ptr %w.coerce, align 8, !dbg !2193
store i64 %1139, ptr %tmp.coerce.i2880, align 8
call void @llvm.memcpy.p0.p0.i64(ptr align 2 %w.i2879, ptr align 8 %tmp.coerce.i2880, i64 4, i1 false)
store ptr %1136, ptr %dictionary.addr.i2881, align 8
tail call void @llvm.dbg.declare(metadata ptr %dictionary.addr.i2881, metadata !2194, metadata !DIExpression()), !dbg !2198
tail call void @llvm.dbg.declare(metadata ptr %w.i2879, metadata !2200, metadata !DIExpression()), !dbg !2201
store ptr %1137, ptr %data.addr.i2882, align 8
tail call void @llvm.dbg.declare(metadata ptr %data.addr.i2882, metadata !2202, metadata !DIExpression()), !dbg !2203
store i64 %1138, ptr %max_length.addr.i2883, align 8
tail call void @llvm.dbg.declare(metadata ptr %max_length.addr.i2883, metadata !2204, metadata !DIExpression()), !dbg !2205
%1140 = load i8, ptr %w.i2879, align 2, !dbg !2206
%conv.i2887 = zext i8 %1140 to i64, !dbg !2208
%1141 = load i64, ptr %max_length.addr.i2883, align 8, !dbg !2209
%cmp.i2888 = icmp ugt i64 %conv.i2887, %1141, !dbg !2210
br i1 %cmp.i2888, label %if.then.i2972, label %if.else.i2889, !dbg !2211
The new IR. I updated the riscv.ll. IR.zip
I think what is happening is that clang is coercing the DictWord struct into an XLen sized integer type in RISCVABIInfo::classifyArgumentType
. This is for the call to IsMatch
in the brotli source.
Specifically this code
// Aggregates which are <= 2*XLen will be passed in registers if possible,
// so coerce to integers.
if (Size <= 2 * XLen) {
unsigned Alignment = getContext().getTypeAlign(Ty);
// Use a single XLen int if possible, 2*XLen if 2*XLen alignment is
// required, and a 2-element XLen array if only XLen alignment is required.
if (Size <= XLen) {
return ABIArgInfo::getDirect(
llvm::IntegerType::get(getVMContext(), XLen));
} else if (Alignment == 2 * XLen) {
return ABIArgInfo::getDirect(
llvm::IntegerType::get(getVMContext(), 2 * XLen));
} else {
return ABIArgInfo::getDirect(llvm::ArrayType::get(
llvm::IntegerType::get(getVMContext(), XLen), 2));
}
}
This creates the memcpy. After that the IsMatch function is inlined, but no other optimizations are done to cleanup the memcpy since you compiled without optimizations.
X86 on the other hand passes the struct in an i32 value which didn't require the memcpy.
The handling of ABI is weirdly divided between clang and the backend. The ABI says a small struct is passed packed in a single integer register. Since only clang knows that it is a C struct, clang is responsible for coercing the struct to an integer type. I'm not sure we need to use XLen for this integer type. Using i32 would probably still work with the backend.
@asb @jrtc27 @kito-cheng
@llvm/issue-subscribers-backend-risc-v
Author: Yuqi Guo (guoyuqi020)
Does it really matter? This is the kind of thing that SROA can optimise.
I‘m not quite familiar with the LLVM's optimization, so I have several questions here.
clang -g -fno-discard-value-names -S -emit-llvm <source-code-file> -o output.ll
. How can I enable these SROA optimizations? memcpy
calls in my data, so I guess at the time of ThreadSanitizer's instrumentation, SROA had not been performed. Will it be possible to perform SROA (and any other necessary optimizations) before ThreadSanitizer's instrumentation?Any -On for n > 0 will enable those kinds of optimisations. You're just producing unoptimised IR, since the default is -O0, and unoptimised IR is deliberately very stupid.
I'm a researcher studying the difference between clang's IR generation on x86-64 and RISCV64. Recently, I've been trying the
brotli-1.0.9
package. I found that on RISCV64, clang can generate weird complex IR for simple load-and-store logic.For the same basic block, the IR generated on x86-64 is like this:
On RISCV64, it's like this:
Most of the code is the same. The difference exists in a small load-and-store logic. On x86-64, I have:
But on RISCV64, I have:
Note that
%w
points to a struct with 2-byte aligned:And
%struct.DictWord
is a 32-bit struct:These two slices of code do the same thing. They load an
i32
from%w
and then store it in another value. The problem is, since this is ani32
value, why RISCV64 bothers to use twoi64
variables and twomemcpy
calls to do such simple value transmission? Do you have any ideas?======================================================
I have uploaded the source code of the package
brotli-1.0.9
. brotli-1.0.9.zipThe instructions I used to generate IR:
I uploaded the IR generated on my machines. Hope they may help. IR.zip
I'm using commit
132bf4aedd678277b57d8e2bdabf9a1e9eb254c5
of LLVM.If you need any other information, please let me know.