noseglasses / elf_diff

A tool to compare ELF binaries
GNU General Public License v3.0
171 stars 21 forks source link

Run c++filt on the ASM output. #110

Open mithro opened 5 months ago

mithro commented 5 months ago

Is your feature request related to a problem? Please describe.

Currently when looking at the assembly output I get the following;

14  mov    %rax,(%rsp)
12  mov    $0xffffffff,%edx
13  xor    %r8d,%r8d
14  call   34 <_ZN3gpl3FFT5doFFTEv+0x34>   30: R_X86_64_PLT32   _ZN3gpl6ddct2dEiiiPPfS0_PiS0_-0x4
15  mov    0xb0(%rbx),%edi
16  test   %edi,%edi
17  jle    72 <_ZN3gpl3FFT5doFFTEv+0x72>

Describe the solution you'd like

If you run c++filt on the above you'll get the following;

14      mov    %rax,(%rsp)
12      mov    bashxffffffff,%edx
13      xor    %r8d,%r8d
14      call   34 <gpl::FFT::doFFT()+0x34>   30: R_X86_64_PLT32   gpl::ddct2d(int, int, int, float**, float*, int*, float*)-0x4
15      mov    0xb0(%rbx),%edi
16      test   %edi,%edi
17      jle    72 <gpl::FFT::doFFT()+0x72>

Which is much nicer to read.

Describe alternatives you've considered

I would have sent a pull request myself but I wasn't able to figure out where to run the c++filt. As a hack I tried adding it to the objdump command in instruction_collector.py:class InstructionCollector(object):gatherSymbolInstructions but it didn't seem to have any effect.

You mention c++filt in the README, so I was surprised that it wasn't already demangling the strings in the assembly output.

mithro commented 5 months ago

This is a hacky patch that I came up with;

diff --git a/src/elf_diff/instruction_collector.py b/src/elf_diff/instruction_collector.py
index cdeee4d..0e86125 100644
--- a/src/elf_diff/instruction_collector.py
+++ b/src/elf_diff/instruction_collector.py
@@ -27,6 +27,7 @@ from typing import Optional, Dict, List
 import re
 import progressbar  # type: ignore # Make mypy ignore this module
 import sys
+import subprocess

 SOURCE_CODE_START_TAG = "...ED_SOURCE_START..."
 SOURCE_CODE_END_TAG = "...ED_SOURCE_END..."
@@ -35,8 +36,7 @@ SOURCE_CODE_END_TAG = "...ED_SOURCE_END..."
 class InstructionCollector(object):
     def __init__(self, symbols):
         # type: (Dict[str, Symbol]) -> None
-
-        self.symbols: Dict[str, Symbol] = symbols
+        self.symbols = {s.name: s for s in symbols.values()}

         self.header_line_re = re.compile("^(0x)?[0-9A-Fa-f]+ <(.+)>:")
         self.instruction_line_re = re.compile(
@@ -141,6 +141,15 @@ class InstructionCollector(object):
                 filename,
             ]
         )
+        print("c++filt")
+        proc = subprocess.Popen(
+            ['c++filt'],
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            stdin=subprocess.PIPE,
+            encoding='utf-8',
+        )
+        objdump_output, e = proc.communicate(objdump_output)

         in_instruction_lines: bool = False
         for line in progressbar.progressbar(objdump_output.splitlines()):
noseglasses commented 5 months ago

I remeber having considered doing what you are suggesting when initially developing the tool. However, at some point I decided against it as I wanted the number of external tools (binutils) to be as small as possible. Reason is I have seen incomplete binutils packages in the past.

However, it would definitely make sense to have elf_diff search for c++-filt an use it similar to what your patch does if found. To keep the tool backward compatible, there should also be a command line flag that explicitly enables c++-filt usage.

I would definitely be happy to review a PR.