intelxed / xed

The X86 Encoder Decoder (XED), is a software library for encoding and decoding X86 (IA32 and Intel64) instructions
https://intelxed.github.io/
Apache License 2.0
1.38k stars 146 forks source link

xed-ild reports incorrect length for vpsubq, vmovhpd instructions #298

Closed mwkrentel closed 1 year ago

mwkrentel commented 1 year ago

I'm seeing a problem with the xed-ild library for decoding an instruction's length. Specifically, for some instructions (vpsubq, vmovhpd, etc), xed-ild returns a length that is too short.

This is with xed at main as of Mar 19, 2023, same as 2022.10.11 release. Built on Linux RH 8.x, x86_64, with gcc 8.5.0 and python 3.10.8.

commit 9fc12ab6c0ba7a9eaadb20135369b4b4107fa670
(HEAD -> main, tag: v2022.10.11, origin/main, origin/HEAD)
Author: sdeadmin <sde_admin@intel.com>
Date:   Wed Oct 12 13:36:47 2022 +0300

    External Release v2022.10.11

I tried the xed-ex-ild.c example where I changed itext to a vpsubq instruction.

  unsigned char itext[15] = { 
      0x62, 0xf1, 0xfd, 0x48, 0xfb, 0x4c, 0x24, 0x7a, 0x00,
  };

Objdump reports this a 8-byte instruction.

31fd0d4:       62 f1 fd 48 fb 4c 24    vpsubq 0x1e80(%rsp),%zmm0,%zmm1
31fd0db:       7a

But xed-ex-ild reports it as 2 bytes.

$ ./xed-ex-ild 
length = 2

Similarly for vmovhpd, objdump reports a7-byte instruction.

30c2f0d:       62 61 cd 00 16 04 fb    vmovhpd (%rbx,%rdi,8),%xmm22,%xmm24

And xed-ex-ild reports it as 3 bytes.

$ ./xed-ex-ild 
length = 3

Who is right, objdump or xed-ex-ild ?

Is the code in xed-ex-ild.c the right way to ask for the length of the next instruction in some buffer?

    xed_decoded_inst_t xedd;
    xed_state_t dstate;
    unsigned char itext[15] = { ... };

    xed_tables_init(); // one time per process

    dstate.mmode=XED_MACHINE_MODE_LONG_64;

    xed_decoded_inst_zero_set_mode(&xedd, &dstate);
    xed_ild_decode(&xedd, itext, XED_MAX_INSTRUCTION_BYTES);
    printf("length = %u\n",xed_decoded_inst_get_length(&xedd));
marjevan commented 1 year ago

Hi, I can't reproduce the issues above using the xed-ex-ild.c example code (built with several compilers, including GCC 8.3.0). After adding a simple debug print I get the expected results:

$ obj/wkit/bin/xed-ex-ild
ILD decode input: 62f1fd48fb4c247a00000000000000
length = 8
$ obj/wkit/bin/xed-ex-ild
ILD decode input: 6261cd001604fb0000000000000000
length = 7

Several suggestions that might resolve your issue:

mwkrentel commented 1 year ago

@marjevan Are you trying your tests on Linux or Windows?

marjevan commented 1 year ago

Both. In addition to the above, you can check if any error is returned.

A complete code suggestion:

    xed_decoded_inst_t xedd;
    xed_state_t dstate;
    xed_uint_t i;
    xed_error_enum_t xed_error;
    xed_uint8_t itext[XED_MAX_INSTRUCTION_BYTES] = { 
        0x62, 0xf1, 0xfd, 0x48, 0xfb, 0x4c, 0x24, 0x7a, 0x00,
    };

    printf("ILD decode input: ");
    for(i=0;i<XED_MAX_INSTRUCTION_BYTES;i++) {
        printf("%02x", itext[i]);
    }
    printf("\n");

    xed_tables_init(); // one time per process

    xed_state_zero(&dstate);

    dstate.mmode=XED_MACHINE_MODE_LONG_64;

    xed_decoded_inst_zero_set_mode(&xedd, &dstate);
    xed_error = xed_ild_decode(&xedd, itext, XED_MAX_INSTRUCTION_BYTES);

    switch(xed_error) {
        case XED_ERROR_NONE:
            break;
        case XED_ERROR_BUFFER_TOO_SHORT:
            printf("Not enough bytes provided\n");
            return 1;
        case XED_ERROR_GENERAL_ERROR:
            printf("Could not decode given input.\n");
            return 1;
        default:
            printf("Unhandled error code %s\n",xed_error_enum_t2str(xed_error));
            return 1;
    }

    printf("length = %u\n",xed_decoded_inst_get_length(&xedd));

Can you try this?


Alternatively, you can use XED decoder to check and collect more information (including a check for illegal instruction). The APIs are the same (example in xed-ex-ild2.c), Just replace the xed_ild_decode() function with xed_decode() (same arguments)

mwkrentel commented 1 year ago

@marjevan I tried your full program (with a few trivial changes). ILD still reports length 2 (with no errors). In my experience, ILD always reports success, even if you give it jibberish. The answer may be wrong, but it will always report an answer.

Changing xed_ild_decode to xed_decode reports an error:

$ ./length
ILD decode input:  62 f1 fd 48 fb 4c 24 7a 00 00 00 00 00 00 00
xed_error = 2
Could not decode given input.

Are there any debug flags or environ variables that would get a more verbose reason why it hit an error?

So, we still don't have reproducible results. The only other thing I can think of is: exactly what version are you running?

I'm running tag v2022.10.11 which is the head of main on the publically available github repo (9fc12ab6c0ba7a9). Is it possible you're running something later? Maybe something internal to Intel?

marjevan commented 1 year ago

I'm using the latest XED external release. What about the mbuild repository? Could you let me know if you are using the latest commit?

mwkrentel commented 1 year ago

Normally I build XED with spack. Rashawn can tell you more.

https://spack.readthedocs.io/en/latest/

I checked the recipe for intel-xed (which I wrote). It's a couple revs behind for mbuild, but I don't think they're significant.

Anyway, to make sure, I rebuilt outside of spack with the latest xed and latest mbuild.

Xed-ild is still getting the same wrong answer. Again from the program you sent me.

$ ./intel 
ILD decode input:  62 f1 fd 48 fb 4c 24 7a 00 00 00 00 00 00 00
xed_error = 2
Could not decode given input.

But running xed looks like it worked.

$ ./xed -v 99 -64 -d 62 f1 fd 48 fb 4c 24 7a
Initializing XED tables...
Done initialing XED tables.
#XED version: [v2022.10.11]
Converting 6 & 2
Converting f & 1
Converting f & d
Converting 4 & 8
Converting f & b
Converting 4 & c
Converting 2 & 4
Converting 7 & a
62F1FD48FB4C247A
Decode time = 245504
VPSUBQ VPSUBQ_ZMMu64_MASKmskw_ZMMu64_MEMu64_AVX512 DISP_WIDTH:8, EASZ:3, ELEMENT_SIZE:64, EOSZ:3, HAS_MODRM:1, HAS_SIB, LLRC:2, LZCNT, MAP:1, MAX_BYTES:8, MEM0:ptr [RSP+0x1e80], MOD:1, MODE:2, MODRM_BYTE:76, MUST_USE_EVEX, NEED_MEMDISP:8, NELEM:8, NOMINAL_OPCODE:251, OUTREG:ZMM0, P4, POS_DISP:7, POS_MODRM:5, POS_NOMINAL_OPCODE:4, POS_SIB:6, REG:1, REG0:ZMM1, REG1:K0, REG2:ZMM0, REXW, RM:4, SIBBASE:4, SIBINDEX:4, SMODE:2, TZCNT, UBIT, USING_DEFAULT_SEGMENT0, VEXDEST210:7, VEXDEST3, VEXVALID:2, VEX_PREFIX:1, VL:2
0       REG0/W/ZU64/EXPLICIT/NT_LOOKUP_FN/ZMM_R3
1       REG1/R/MSKW/EXPLICIT/NT_LOOKUP_FN/MASK1
2       REG2/R/ZU64/EXPLICIT/NT_LOOKUP_FN/ZMM_N3
3       MEM0/R/VV/EXPLICIT/IMM_CONST/1
YDIS: vpsubq zmm1, zmm0, zmmword ptr [rsp+0x1e80]
ICLASS:     VPSUBQ
CATEGORY:   AVX512
EXTENSION:  AVX512EVEX
IFORM:      VPSUBQ_ZMMu64_MASKmskw_ZMMu64_MEMu64_AVX512
ISA_SET:    AVX512F_512
ATTRIBUTES: BROADCAST_ENABLED DISP8_FULL MASKOP_EVEX MEMORY_FAULT_SUPPRESSION 
SHORT:      vpsubq zmm1, zmm0, zmmword ptr [rsp+0x1e80]

It's looking to me like the full xed is Ok, but either there's a problem with xed-ild or else in the way the examples are calling it.

mwkrentel commented 1 year ago

Is it possible that XED is looking at the machine it's running on and won't decode an instruction that the current machine doesn't support ??

marjevan commented 1 year ago

No, XED doesn't use the host machine.

OK, as XED's built-in example works fine, we can assume that there are no issues with XED decoder library and the way you built it outside of spack. Now, let's check the XED ILD library, again - outside of spack:

In case the built-in example tools are passed and report the expected results, my advice is to check your tool's environment and the way it is linked to the XED library. Unfortunately, I can't help with spack.

mwkrentel commented 1 year ago

Ok, I figured out what's going on, where the bug is and why we get different results. There's a bug somewhere in libxed-ild. Libxed works, but xed-ild has a bug somewhere.

Start with the new xed-ex-ild.c from the branch and remove the xed_error_enum_t2str() function. (xed-ild doesn't support this.)

--- a/examples/xed-ex-ild.c
+++ b/examples/xed-ex-ild.c
@@ -60,7 +60,7 @@
             printf("Could not decode given input.\n");
             return 1;
         default:
-            printf("Unhandled error code %s\n",xed_error_enum_t2str(xed_error));
+            printf("Unhandled error code\n");
             return 1;
     }

I renamed the program intel.c. Compile (-c) this to a .o file. Link this with libxed.so (I was using shared libs.) This version works.

$ ldd intel
    linux-vdso.so.1 (0x00007ffcb75eb000)
    libxed.so => /home/krentel/xed/xed/obj/wkit/lib/libxed.so (0x00007ff34a600000)
    libc.so.6 => /lib64/libc.so.6 (0x00007ff34a200000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ff34aced000)

$ ./intel 
ILD decode input: 62 f1 fd 48 fb 4c 24 7a 00 00 00 00 00 00 00 
length = 8

But link the same .o file with libxed-ild.so and now it produces the wrong answer.

$ ldd intel2
    linux-vdso.so.1 (0x00007ffff2da4000)
    libxed-ild.so => /home/krentel/xed/xed/obj/wkit/lib/libxed-ild.so (0x00007f3efb629000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f3efb400000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f3efb640000)

$ ./intel2
ILD decode input: 62 f1 fd 48 fb 4c 24 7a 00 00 00 00 00 00 00 
length = 2

So, clearly there's a bug somewhere in libxed-ild.so compared with libxed.

This also explains why we kept getting different results. I was linking with libxed-ild, but the mfile.py machinery links with libxed. I can be pretty sure of this because libxed-ild doesn't support xed_error_enum_t2str().

marjevan commented 1 year ago

Using libxed-ild.so, I was able to reproduce the issues. Thanks.

I found that the issues were caused by a missing chip-model-info table initialization. This bug causes wrong ILD length detection for all AVX512/EVEX instructions when using the xed-ild library. Please note that the full libxed.so library has no issues, and it works fine (as you mentioned above).

The fix will be available on the next external release. Do you have any urgent dependencies specifically for the ILD library? Can you use the full libxed.so library meantime?

mwkrentel commented 1 year ago

Yes, I can use the full libxed for now. Do you have a patch?

marjevan commented 1 year ago

Great.

Please let me know if this patch works for you. 0001-ILD-standalone-library-fix.patch

mwkrentel commented 1 year ago

Yes, using the patch and libxed-ild alone, I'm able to reproduce the same results that objdump reports.

marjevan commented 1 year ago

Great. As I mentioned, the fix will be officially available with the next external release.

Thanks for the report.