Closed mwkrentel closed 1 year ago
Hi, I can't reproduce the issues above using the xed-ex-ild.c example code (built with several compilers, including GCC 8.3.0). After adding a simple debug print I get the expected results:
$ obj/wkit/bin/xed-ex-ild
ILD decode input: 62f1fd48fb4c247a00000000000000
length = 8
$ obj/wkit/bin/xed-ex-ild
ILD decode input: 6261cd001604fb0000000000000000
length = 7
Several suggestions that might resolve your issue:
dstate
before mmode
init using: xed_state_zero(&dstate);
itext
to xed_uint8_t
@marjevan Are you trying your tests on Linux or Windows?
Both. In addition to the above, you can check if any error is returned.
A complete code suggestion:
xed_decoded_inst_t xedd;
xed_state_t dstate;
xed_uint_t i;
xed_error_enum_t xed_error;
xed_uint8_t itext[XED_MAX_INSTRUCTION_BYTES] = {
0x62, 0xf1, 0xfd, 0x48, 0xfb, 0x4c, 0x24, 0x7a, 0x00,
};
printf("ILD decode input: ");
for(i=0;i<XED_MAX_INSTRUCTION_BYTES;i++) {
printf("%02x", itext[i]);
}
printf("\n");
xed_tables_init(); // one time per process
xed_state_zero(&dstate);
dstate.mmode=XED_MACHINE_MODE_LONG_64;
xed_decoded_inst_zero_set_mode(&xedd, &dstate);
xed_error = xed_ild_decode(&xedd, itext, XED_MAX_INSTRUCTION_BYTES);
switch(xed_error) {
case XED_ERROR_NONE:
break;
case XED_ERROR_BUFFER_TOO_SHORT:
printf("Not enough bytes provided\n");
return 1;
case XED_ERROR_GENERAL_ERROR:
printf("Could not decode given input.\n");
return 1;
default:
printf("Unhandled error code %s\n",xed_error_enum_t2str(xed_error));
return 1;
}
printf("length = %u\n",xed_decoded_inst_get_length(&xedd));
Can you try this?
Alternatively, you can use XED decoder to check and collect more information (including a check for illegal instruction).
The APIs are the same (example in xed-ex-ild2.c
), Just replace the xed_ild_decode()
function with xed_decode()
(same arguments)
@marjevan I tried your full program (with a few trivial changes). ILD still reports length 2 (with no errors). In my experience, ILD always reports success, even if you give it jibberish. The answer may be wrong, but it will always report an answer.
Changing xed_ild_decode
to xed_decode
reports an error:
$ ./length
ILD decode input: 62 f1 fd 48 fb 4c 24 7a 00 00 00 00 00 00 00
xed_error = 2
Could not decode given input.
Are there any debug flags or environ variables that would get a more verbose reason why it hit an error?
So, we still don't have reproducible results. The only other thing I can think of is: exactly what version are you running?
I'm running tag v2022.10.11 which is the head of main on the publically available github repo (9fc12ab6c0ba7a9). Is it possible you're running something later? Maybe something internal to Intel?
I'm using the latest XED external release. What about the mbuild repository? Could you let me know if you are using the latest commit?
python3 ./mfile.py clean test --asserts
./obj/wkit/bin/xed -v 99 -64 -d 62 f1 fd 48 fb 4c 24 7a
Normally I build XED with spack. Rashawn can tell you more.
https://spack.readthedocs.io/en/latest/
I checked the recipe for intel-xed (which I wrote). It's a couple revs behind for mbuild, but I don't think they're significant.
Anyway, to make sure, I rebuilt outside of spack with the latest xed and latest mbuild.
Xed-ild is still getting the same wrong answer. Again from the program you sent me.
$ ./intel
ILD decode input: 62 f1 fd 48 fb 4c 24 7a 00 00 00 00 00 00 00
xed_error = 2
Could not decode given input.
But running xed
looks like it worked.
$ ./xed -v 99 -64 -d 62 f1 fd 48 fb 4c 24 7a
Initializing XED tables...
Done initialing XED tables.
#XED version: [v2022.10.11]
Converting 6 & 2
Converting f & 1
Converting f & d
Converting 4 & 8
Converting f & b
Converting 4 & c
Converting 2 & 4
Converting 7 & a
62F1FD48FB4C247A
Decode time = 245504
VPSUBQ VPSUBQ_ZMMu64_MASKmskw_ZMMu64_MEMu64_AVX512 DISP_WIDTH:8, EASZ:3, ELEMENT_SIZE:64, EOSZ:3, HAS_MODRM:1, HAS_SIB, LLRC:2, LZCNT, MAP:1, MAX_BYTES:8, MEM0:ptr [RSP+0x1e80], MOD:1, MODE:2, MODRM_BYTE:76, MUST_USE_EVEX, NEED_MEMDISP:8, NELEM:8, NOMINAL_OPCODE:251, OUTREG:ZMM0, P4, POS_DISP:7, POS_MODRM:5, POS_NOMINAL_OPCODE:4, POS_SIB:6, REG:1, REG0:ZMM1, REG1:K0, REG2:ZMM0, REXW, RM:4, SIBBASE:4, SIBINDEX:4, SMODE:2, TZCNT, UBIT, USING_DEFAULT_SEGMENT0, VEXDEST210:7, VEXDEST3, VEXVALID:2, VEX_PREFIX:1, VL:2
0 REG0/W/ZU64/EXPLICIT/NT_LOOKUP_FN/ZMM_R3
1 REG1/R/MSKW/EXPLICIT/NT_LOOKUP_FN/MASK1
2 REG2/R/ZU64/EXPLICIT/NT_LOOKUP_FN/ZMM_N3
3 MEM0/R/VV/EXPLICIT/IMM_CONST/1
YDIS: vpsubq zmm1, zmm0, zmmword ptr [rsp+0x1e80]
ICLASS: VPSUBQ
CATEGORY: AVX512
EXTENSION: AVX512EVEX
IFORM: VPSUBQ_ZMMu64_MASKmskw_ZMMu64_MEMu64_AVX512
ISA_SET: AVX512F_512
ATTRIBUTES: BROADCAST_ENABLED DISP8_FULL MASKOP_EVEX MEMORY_FAULT_SUPPRESSION
SHORT: vpsubq zmm1, zmm0, zmmword ptr [rsp+0x1e80]
It's looking to me like the full xed is Ok, but either there's a problem with xed-ild or else in the way the examples are calling it.
Is it possible that XED is looking at the machine it's running on and won't decode an instruction that the current machine doesn't support ??
No, XED doesn't use the host machine.
OK, as XED's built-in example works fine, we can assume that there are no issues with XED decoder library and the way you built it outside of spack. Now, let's check the XED ILD library, again - outside of spack:
python3 ./mfile.py clean test --asserts
obj/wkit/bin/xed-ex-ild
.
This is actually the code I sent previously that executes XED ILD with one of the reported encoding issues.
The expected result is length = 8
.obj/wkit/bin/xed-ex-ild2 -64 62 f1 fd 48 fb 4c 24 7a
(You can try different encodings for a wider check).In case the built-in example tools are passed and report the expected results, my advice is to check your tool's environment and the way it is linked to the XED library. Unfortunately, I can't help with spack.
Ok, I figured out what's going on, where the bug is and why we get different results. There's a bug somewhere in libxed-ild. Libxed works, but xed-ild has a bug somewhere.
Start with the new xed-ex-ild.c
from the branch and remove the
xed_error_enum_t2str()
function. (xed-ild doesn't support this.)
--- a/examples/xed-ex-ild.c
+++ b/examples/xed-ex-ild.c
@@ -60,7 +60,7 @@
printf("Could not decode given input.\n");
return 1;
default:
- printf("Unhandled error code %s\n",xed_error_enum_t2str(xed_error));
+ printf("Unhandled error code\n");
return 1;
}
I renamed the program intel.c
. Compile (-c
) this to a .o
file.
Link this with libxed.so
(I was using shared libs.)
This version works.
$ ldd intel
linux-vdso.so.1 (0x00007ffcb75eb000)
libxed.so => /home/krentel/xed/xed/obj/wkit/lib/libxed.so (0x00007ff34a600000)
libc.so.6 => /lib64/libc.so.6 (0x00007ff34a200000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff34aced000)
$ ./intel
ILD decode input: 62 f1 fd 48 fb 4c 24 7a 00 00 00 00 00 00 00
length = 8
But link the same .o
file with libxed-ild.so
and now it produces
the wrong answer.
$ ldd intel2
linux-vdso.so.1 (0x00007ffff2da4000)
libxed-ild.so => /home/krentel/xed/xed/obj/wkit/lib/libxed-ild.so (0x00007f3efb629000)
libc.so.6 => /lib64/libc.so.6 (0x00007f3efb400000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3efb640000)
$ ./intel2
ILD decode input: 62 f1 fd 48 fb 4c 24 7a 00 00 00 00 00 00 00
length = 2
So, clearly there's a bug somewhere in libxed-ild.so compared with libxed.
This also explains why we kept getting different results. I was
linking with libxed-ild, but the mfile.py machinery links with libxed.
I can be pretty sure of this because libxed-ild doesn't support
xed_error_enum_t2str()
.
Using libxed-ild.so
, I was able to reproduce the issues.
Thanks.
I found that the issues were caused by a missing chip-model-info table initialization.
This bug causes wrong ILD length detection for all AVX512/EVEX instructions when using the xed-ild
library.
Please note that the full libxed.so
library has no issues, and it works fine (as you mentioned above).
The fix will be available on the next external release.
Do you have any urgent dependencies specifically for the ILD library? Can you use the full libxed.so
library meantime?
Yes, I can use the full libxed for now. Do you have a patch?
Great.
Please let me know if this patch works for you. 0001-ILD-standalone-library-fix.patch
Yes, using the patch and libxed-ild
alone, I'm able to reproduce
the same results that objdump reports.
Great. As I mentioned, the fix will be officially available with the next external release.
Thanks for the report.
I'm seeing a problem with the xed-ild library for decoding an instruction's length. Specifically, for some instructions (vpsubq, vmovhpd, etc), xed-ild returns a length that is too short.
This is with xed at main as of Mar 19, 2023, same as 2022.10.11 release. Built on Linux RH 8.x, x86_64, with gcc 8.5.0 and python 3.10.8.
I tried the
xed-ex-ild.c
example where I changeditext
to avpsubq
instruction.Objdump reports this a 8-byte instruction.
But xed-ex-ild reports it as 2 bytes.
Similarly for
vmovhpd
, objdump reports a7-byte instruction.And xed-ex-ild reports it as 3 bytes.
Who is right, objdump or xed-ex-ild ?
Is the code in
xed-ex-ild.c
the right way to ask for the length of the next instruction in some buffer?