intelxed / xed

The X86 Encoder Decoder (XED), is a software library for encoding and decoding X86 (IA32 and Intel64) instructions
https://intelxed.github.io/
Apache License 2.0
1.39k stars 145 forks source link

XED to assist JIT analysis tools #68

Open lukego opened 7 years ago

lukego commented 7 years ago

Howdy!

I am looking for the best way to hook XED into the new Studio IDE for analyzing machine code generated by RaptorJIT. I am writing for help to find the right initial approach. I hope you will indulge a little thinking out aloud...

The problem I want to solve is to convert a binary blob of machine code into a higher-level representation. I will use the high-level representation both for presenting to the user, either as textual disassembly or visual dependency graphs (etc), and also for automatic cross-referencing, e.g. with JIT IR code and PEBS data and so on.

I would like to make this machine-code-to-abstract-structure conversion using a command-line tool similar to xed. This would take binary object code for input and generate all the information that I need in easy-to-parse file formats (e.g. XML/JSON/CSV/msgpack/etc.) I will be operating on small amounts of code at any one time: about one thousand instructions or less.

The information that I would like to get for each instruction is:

Just poking around it seems like I may be able to get all of this from the xed command. It seems like I could do that with three invocations: xed -i x to get a one-liner disassembly of each instruction; xed -xml -i x to get a more structured view of the same thing; and xed -dot -i x to get the dependency information (if that dependency info is complete enough?)

I would probably want to post-process this to put all of the information in one place, e.g. have one XML representation that also includes the textual disassembly and dependency information. This could get messy (e.g. parsing dot files) and so it could make more sense to modify xed to emit the format that I want, or write a new decoder in C, or write a new decoder in some higher-level language like Python that had a xed binding.

So! I'd really appreciate some tips. Are there any off-the-shelf programs that can give me what I want already? Or are there suitable xed bindings to high level languages that could be recommended to write this quickly? Or does it make sense to parse and combine the xed output? Or should I extend xed or write a new decoder in C?

(Thanks for reading!)

lukego commented 7 years ago

... I would also like to have a latency estimate for each instruction. I suppose the way to do that will be for XED to decode the instruction to identify its operand types and then to look this up in Agner's instruction tables.

markcharney commented 7 years ago

well, intel posts a doc (or at least i saw one posted) with latencies for each xed iform. i can see if i can find a link next week. will try to respond to your larger question tomorrow

markcharney commented 7 years ago

Sounds to me like you need to do some programming. The code in the examples is there as examples. Most of the pieces you seem to want are there already, except the latencies. I would suggest you write your own libxed-based tool that emits the information you require.

The xed-iform-based latency/throughput stuff is in the following doc. I don't want to integrate this information in to XED as it changes for every design. (Obviously, it would be nice if the author also emitted a few arrays people could import! I might poke around & see if I can figure out who made the doc).

https://software.intel.com/sites/default/files/managed/ad/dc/Intel-Xeon-Scalable-Processor-throughput-latency.pdf

hlide commented 7 years ago

You can also consider this page, especially the 4th section Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs. There is a .pdf file and a .ods file (OpenOffice spreadsheet) and they seem to be updated quite often. I wonder whether it were possible to extract such informations automatically from the .ods file.