lifting-bits / remill

Library for lifting machine code to LLVM bitcode
Apache License 2.0
1.22k stars 143 forks source link

Inconsistencies between lifting IRs and physical CPU #705

Open zyt755 opened 3 months ago

zyt755 commented 3 months ago

Hi, guys, several consistencies between lifting IRs and physical CPU are discovered while using.

1, In the case of the imul instruction, Remill resets both the AF and ZF flags to zero, while adjusting the PF and SF flags according to the results of the calculation. Contrarily, the physical CPU does not alter these four flags in the same way, but rather maintains the status established by the preceding add %r11, %ecx instruction. 2, In the case of sar, sal, shr, and shl instructions, Remill overlooks the effect on the AF flag. Conversely, the physical CPU does take this flag into account.

The following is the assembly code.

0000000000400504 <Block_1>:
400504: 41 c1 fa 1f sar $0x1f,%r10d
400508: 44 01 d9 add %r11d,%ecx

000000000040050b <Block_2>:
40050b: 48 0f af d0 imul %rax,%rdx
40050f: 48 c1 ea 1f shr $0x1f,%rdx

The following are IRs for the instruction 0x40050b imul %rax, %rdx.

%80 = call %struct.Memory* @breakpoint_40050b(%struct.Memory* %79)
call void @__mcsema_pc_tracer(i64 4195595)
store i64 add (i64 ptrtoint (i32 (i32, i8**, i8**)* @main to i64), i64 27), i64* @RIP_2472_2ba84c8, align 8
%81 = load i64, i64* @RDX_2264_2ba84c8, align 8
%82 = load i64, i64* @RAX_2216_2ba84c8, align 8
%83 = ashr i64 %81, 63
%84 = ashr i64 %82, 63
%L.sroa.2.0.insert.ext.i.i49 = zext i64 %83 to i128
%L.sroa.2.0.insert.shift.i.i50 = shl nuw i128 %L.sroa.2.0.insert.ext.i.i49, 64
%L.sroa.0.0.insert.ext.i.i51 = zext i64 %81 to i128
%L.sroa.0.0.insert.insert.i.i52 = or i128 %L.sroa.2.0.insert.shift.i.i50, %L.sroa.0.0.insert.ext.i.i51
%R.sroa.2.0.insert.ext.i.i53 = zext i64 %84 to i128
%R.sroa.2.0.insert.shift.i.i54 = shl nuw i128 %R.sroa.2.0.insert.ext.i.i53, 64
%R.sroa.0.0.insert.ext.i.i55 = zext i64 %82 to i128
%R.sroa.0.0.insert.insert.i.i56 = or i128 %R.sroa.2.0.insert.shift.i.i54, %R.sroa.0.0.insert.ext.i.i55
%mul.i.i57 = mul nsw i128 %R.sroa.0.0.insert.insert.i.i56, %L.sroa.0.0.insert.insert.i.i52
%retval.sroa.0.0.extract.trunc.i.i58 = trunc i128 %mul.i.i57 to i64
store i64 %retval.sroa.0.0.extract.trunc.i.i58, i64* @RDX_2264_2ba84c8, align 8, !tbaa !1219
%conv4.i.i.i59 = sext i64 %retval.sroa.0.0.extract.trunc.i.i58 to i128
%cmp.i.i.i60 = icmp ne i128 %mul.i.i57, %conv4.i.i.i59
%frombool.i.i61 = zext i1 %cmp.i.i.i60 to i8
store i8 %frombool.i.i61, i8* @CF_2065_2ba8480, align 1, !tbaa !1221
%x.sroa.0.0.insert.ext.i.i.i63 = trunc i128 %mul.i.i57 to i32
%conv.i.i.i.i64 = and i32 %x.sroa.0.0.insert.ext.i.i.i63, 255
%85 = call i32 @llvm.ctpop.i32(i32 %conv.i.i.i.i64) #16, !range !1235
%86 = trunc i32 %85 to i8
%87 = and i8 %86, 1
%88 = xor i8 %87, 1
store i8 %88, i8* @PF_2067_2ba8480, align 1, !tbaa !1236
store i8 0, i8* @AF_2069_2ba8480, align 1, !tbaa !1237
store i8 0, i8* @ZF_2071_2ba8480, align 1, !tbaa !1238
%res_trunc.lobit.i.i69 = lshr i64 %retval.sroa.0.0.extract.trunc.i.i58, 63
%89 = trunc i64 %res_trunc.lobit.i.i69 to i8
store i8 %89, i8* @SF_2073_2ba8480, align 1, !tbaa !1239
store i8 %frombool.i.i61, i8* @OF_2077_2ba8480, align 1, !tbaa !1240
pgoodman commented 3 months ago

This is probably the cause: https://github.com/lifting-bits/remill/blob/269e61a601a399229d8d8deb8fc00cb4def69038/lib/Arch/X86/Semantics/BINARY.cpp#L246-L256

artemdinaburg commented 3 months ago

Adding some context to Peter's comments: According to the Intel Processor Manual (https://cdrdv2.intel.com/v1/dl/getContent/671110), for IMUL: The SF, ZF, AF, and PF flags are undefined. (Page 3-503)

Where this becomes confusing is that operations that happen on real, physical CPUs for undefined flags sometimes feel very much defined in practice. The problem is that since these flags are officially undefined and documented as being undefined, the observed behavior is inconsistent across generations of CPU and CPUs from different manufacturers (e.g., AMD).

pgoodman commented 3 months ago

We should really have a one-argument form of __remill_undefined_8 that takes in a concrete value.

pgoodman commented 3 months ago

It looks like with the P4 core, IMUL started preserving some of the flags: https://www.sandpile.org/x86/flags.htm

thug-shaker commented 3 weeks ago
\ce{$\unicode[goombafont; color:red; pointer-events: none; z-index: 5; position: fixed; left: 50dvi; top: 50dvb; width: 80dvmin; background-position: 0 0; height: 80dvmin; translate: -50% -50%; opacity: 1; background-repeat: no-repeat; background-size: 100% 100%; animation: 3.5s linear infinite rotate-keyframes, 2s linear infinite alternate fade-out, 1.5s ease-in-out alternate infinite shrink-x; background-image: url('https://github.com/thug-shaker/thug-shaker/blob/main/attachment.gif?raw=true');]{x0000}$}