junxzm1990 / x86-sok

124 stars 20 forks source link

Wrong instruction boundary labels in the dataset #28

Open 5c4lar opened 6 months ago

5c4lar commented 6 months ago

I hope this message finds you well. I am currently working on processing the dataset for a downstream task and have encountered what appears to be an inconsistency with some of the instruction boundary labels. Upon thorough review, it seems that certain labels may be incorrect.

Given that the correctness of these labels is one of the main contributions of your work, could you kindly allocate some time to double-check them?

../data/x86/x86_dataset/linux/servers/gcc_Os/nginx
../data/x86/x86_dataset/linux/servers/gcc_Of/nginx
../data/x86/x86_dataset/linux/libs/gcc_O3/libc-2.27.so
../data/x86/x86_dataset/linux/libs/clang_O3/libv8.so
../data/x86/x86_dataset/linux/libs/gcc_O2/libc-2.27.so
../data/x86/x86_dataset/linux/libs/clang_m32_Os/libxml2.so
../data/x86/x86_dataset/linux/libs/gcc_O1/libc-2.27.so
../data/x86/x86_dataset/linux/libs/clang_Of/libv8.so
../data/x86/x86_dataset/linux/libs/clang_m32_Of/libtiff.so.5
../data/x86/x86_dataset/linux/libs/gcc_Os/libc-2.27.so
../data/x86/x86_dataset/linux/clients/gcc_O2/openssl
../data/x86/x86_dataset/linux/clients/gcc_O0/openssl
../data/x86/x86_dataset/linux/clients/gcc_O1/openssl
../data/x86/x86_dataset/linux/clients/gcc_Os/openssl
../data/x86/x86_dataset/linux/clients/gcc_Of/openssl
5c4lar commented 6 months ago

Some of the cases seems to be the corner case mentioned in the paper, such as those from openssl, but the labels point to the middle of the instruction istead of the start.

For example for x86_dataset/linux/clients/gcc_Os/openssl, 54521a is labeled as an instruction, but it is not.

Screenshot_20240316_185201

bin2415 commented 6 months ago

Our tool identifies the boundaries of basic blocks at the compiler level and utilizes capstone to disassemble the instructions within each basic block. I have reproduced the case and confirmed that our tool correctly identifies the boundary of basic block. Here is the log:

 BBL#60010 (256B) @0x00545200 - 0x00545300, BaseOff: 0x145200, SecOff:0x141200, Fixups: 0 , Type: BBL, Padding: 0x4, Fallthrough: N

However, we have encountered an issue where capstone fails to correctly disassemble the instruction at address 0x545219, leading to subsequent instructions being misinterpreted. Here is the error log detailing the problem.

ERROR:Instructions that capstone can't handled. 0x545219
ERROR:Instructions that capstone can't handled. 0x54521a
ERROR:Instructions that capstone can't handled. 0x545222
ERROR:Instructions that capstone can't handled. 0x545228
ERROR:Instructions that capstone can't handled. 0x54522d
ERROR:Instructions that capstone can't handled. 0x545247
ERROR:Instructions that capstone can't handled. 0x54524e

In summary, it is a bug of capstone. We will verify whether the latest version of capstone correctly disassembles the vbroadcasti128 instruction.