mahaloz / decomp2dbg

A plugin to introduce interactive symbols into your debugger from your decompiler
BSD 2-Clause "Simplified" License
621 stars 39 forks source link

Implement structs API for ghidra #75

Closed casept closed 1 year ago

casept commented 1 year ago

Implement the structs() endpoint for Ghidra. It should behave the same as for IDA Pro, but I can't test that as I don't have access to it.

This is what the output looks like for /usr/bin/ls on a Fedora system using Python's xmlrpc library:

In [14]: p.d2d.structs()
Out[14]:
{'lconv': {'members': [{'size': 8, 'name': 'decimal_point', 'type': 'char *'},
   {'size': 8, 'name': 'thousands_sep', 'type': 'char *'},
   {'size': 8, 'name': 'grouping', 'type': 'char *'},
   {'size': 8, 'name': 'int_curr_symbol', 'type': 'char *'},
   {'size': 8, 'name': 'currency_symbol', 'type': 'char *'},
   {'size': 8, 'name': 'mon_decimal_point', 'type': 'char *'},
   {'size': 8, 'name': 'mon_thousands_sep', 'type': 'char *'},
   {'size': 8, 'name': 'mon_grouping', 'type': 'char *'},
   {'size': 8, 'name': 'positive_sign', 'type': 'char *'},
   {'size': 8, 'name': 'negative_sign', 'type': 'char *'},
   {'size': 1, 'name': 'int_frac_digits', 'type': 'char'},
   {'size': 1, 'name': 'frac_digits', 'type': 'char'},
   {'size': 1, 'name': 'p_cs_precedes', 'type': 'char'},
   {'size': 1, 'name': 'p_sep_by_space', 'type': 'char'},
   {'size': 1, 'name': 'n_cs_precedes', 'type': 'char'},
   {'size': 1, 'name': 'n_sep_by_space', 'type': 'char'},
   {'size': 1, 'name': 'p_sign_posn', 'type': 'char'},
   {'size': 1, 'name': 'n_sign_posn', 'type': 'char'},
   {'size': 1, 'name': 'int_p_cs_precedes', 'type': 'char'},
   {'size': 1, 'name': 'int_p_sep_by_space', 'type': 'char'},
   {'size': 1, 'name': 'int_n_cs_precedes', 'type': 'char'},
   {'size': 1, 'name': 'int_n_sep_by_space', 'type': 'char'},
   {'size': 1, 'name': 'int_p_sign_posn', 'type': 'char'},
   {'size': 1, 'name': 'int_n_sign_posn', 'type': 'char'}],
  'name': 'lconv'},
 '__sigset_t': {'members': [{'size': 128,
    'name': '__val',
    'type': 'ulong[16]'}],
  'name': '__sigset_t'},
 'siginfo': {'members': [{'size': 4, 'name': 'si_signo', 'type': 'int'},
   {'size': 4, 'name': 'si_errno', 'type': 'int'},
   {'size': 4, 'name': 'si_code', 'type': 'int'},
   {'size': 112, 'name': '_sifields', 'type': '_union_1441'}],
  'name': 'siginfo'},
 'eh_frame_hdr': {'members': [{'size': 1,
    'name': 'eh_frame_hdr_version',
    'type': 'byte'},
   {'size': 1, 'name': 'eh_frame_pointer_encoding', 'type': 'dwfenc'},
   {'size': 1, 'name': 'eh_frame_desc_entry_count_encoding', 'type': 'dwfenc'},
   {'size': 1, 'name': 'eh_frame_table_encoding', 'type': 'dwfenc'}],
  'name': 'eh_frame_hdr'},
 '_IO_marker': {'members': [{'size': 8,
    'name': '_next',
    'type': '_IO_marker *'},
   {'size': 8, 'name': '_sbuf', 'type': '_IO_FILE *'},
   {'size': 4, 'name': '_pos', 'type': 'int'}],
  'name': '_IO_marker'},
 'Elf64_Dyn': {'members': [{'size': 8,
    'name': 'd_tag',
    'type': 'Elf64_DynTag'},
   {'size': 8, 'name': 'd_val', 'type': 'qword'}],
  'name': 'Elf64_Dyn'},
 '_struct_1447': {'members': [{'size': 8, 'name': 'si_band', 'type': 'long'},
   {'size': 4, 'name': 'si_fd', 'type': 'int'}],
  'name': '_struct_1447'},
 '_struct_1446': {'members': [{'size': 8,
    'name': 'si_addr',
    'type': 'void *'}],
  'name': '_struct_1446'},
 '__mbstate_t': {'members': [{'size': 4, 'name': '__count', 'type': 'int'},
   {'size': 4, 'name': '__value', 'type': '_union_27'}],
  'name': '__mbstate_t'},
 '_struct_1445': {'members': [{'size': 4, 'name': 'si_pid', 'type': '__pid_t'},
   {'size': 4, 'name': 'si_uid', 'type': '__uid_t'},
   {'size': 4, 'name': 'si_status', 'type': 'int'},
   {'size': 8, 'name': 'si_utime', 'type': '__clock_t'},
   {'size': 8, 'name': 'si_stime', 'type': '__clock_t'}],
  'name': '_struct_1445'},
 '_struct_1444': {'members': [{'size': 4, 'name': 'si_pid', 'type': '__pid_t'},
   {'size': 4, 'name': 'si_uid', 'type': '__uid_t'},
   {'size': 8, 'name': 'si_sigval', 'type': 'sigval_t'}],
  'name': '_struct_1444'},
 '_struct_1443': {'members': [{'size': 4, 'name': 'si_tid', 'type': 'int'},
   {'size': 4, 'name': 'si_overrun', 'type': 'int'},
   {'size': 8, 'name': 'si_sigval', 'type': 'sigval_t'}],
  'name': '_struct_1443'},
 '_struct_1442': {'members': [{'size': 4, 'name': 'si_pid', 'type': '__pid_t'},
   {'size': 4, 'name': 'si_uid', 'type': '__uid_t'}],
  'name': '_struct_1442'},
 'NoteGnuPropertyElement_4': {'members': [{'size': 4,
    'name': 'prType',
    'type': 'dword'},
   {'size': 4, 'name': 'prDatasz', 'type': 'dword'},
   {'size': 4, 'name': 'data', 'type': 'byte[4]'}],
  'name': 'NoteGnuPropertyElement_4'},
 'Elf64_Shdr': {'members': [{'size': 4, 'name': 'sh_name', 'type': 'dword'},
   {'size': 4, 'name': 'sh_type', 'type': 'Elf_SectionHeaderType'},
   {'size': 8, 'name': 'sh_flags', 'type': 'qword'},
   {'size': 8, 'name': 'sh_addr', 'type': 'qword'},
   {'size': 8, 'name': 'sh_offset', 'type': 'qword'},
   {'size': 8, 'name': 'sh_size', 'type': 'qword'},
   {'size': 4, 'name': 'sh_link', 'type': 'dword'},
   {'size': 4, 'name': 'sh_info', 'type': 'dword'},
   {'size': 8, 'name': 'sh_addralign', 'type': 'qword'},
   {'size': 8, 'name': 'sh_entsize', 'type': 'qword'}],
  'name': 'Elf64_Shdr'},
 'GnuBuildId': {'members': [{'size': 4, 'name': 'namesz', 'type': 'dword'},
   {'size': 4, 'name': 'descsz', 'type': 'dword'},
   {'size': 4, 'name': 'type', 'type': 'dword'},
   {'size': 4, 'name': 'name', 'type': 'string'},
   {'size': 20, 'name': 'hash', 'type': 'byte[20]'}],
  'name': 'GnuBuildId'},
 '_IO_FILE': {'members': [{'size': 4, 'name': '_flags', 'type': 'int'},
   {'size': 8, 'name': '_IO_read_ptr', 'type': 'char *'},
   {'size': 8, 'name': '_IO_read_end', 'type': 'char *'},
   {'size': 8, 'name': '_IO_read_base', 'type': 'char *'},
   {'size': 8, 'name': '_IO_write_base', 'type': 'char *'},
   {'size': 8, 'name': '_IO_write_ptr', 'type': 'char *'},
   {'size': 8, 'name': '_IO_write_end', 'type': 'char *'},
   {'size': 8, 'name': '_IO_buf_base', 'type': 'char *'},
   {'size': 8, 'name': '_IO_buf_end', 'type': 'char *'},
   {'size': 8, 'name': '_IO_save_base', 'type': 'char *'},
   {'size': 8, 'name': '_IO_backup_base', 'type': 'char *'},
   {'size': 8, 'name': '_IO_save_end', 'type': 'char *'},
   {'size': 8, 'name': '_markers', 'type': '_IO_marker *'},
   {'size': 8, 'name': '_chain', 'type': '_IO_FILE *'},
   {'size': 4, 'name': '_fileno', 'type': 'int'},
   {'size': 4, 'name': '_flags2', 'type': 'int'},
   {'size': 8, 'name': '_old_offset', 'type': '__off_t'},
   {'size': 2, 'name': '_cur_column', 'type': 'ushort'},
   {'size': 1, 'name': '_vtable_offset', 'type': 'char'},
   {'size': 1, 'name': '_shortbuf', 'type': 'char[1]'},
   {'size': 8, 'name': '_lock', 'type': '_IO_lock_t *'},
   {'size': 8, 'name': '_offset', 'type': '__off64_t'},
   {'size': 8, 'name': '__pad1', 'type': 'void *'},
   {'size': 8, 'name': '__pad2', 'type': 'void *'},
   {'size': 8, 'name': '__pad3', 'type': 'void *'},
   {'size': 8, 'name': '__pad4', 'type': 'void *'},
   {'size': 8, 'name': '__pad5', 'type': 'size_t'},
   {'size': 4, 'name': '_mode', 'type': 'int'},
   {'size': 20, 'name': '_unused2', 'type': 'char[20]'}],
  'name': '_IO_FILE'},
 'NoteGnuProperty_4': {'members': [{'size': 4,
    'name': 'namesz',
    'type': 'dword'},
   {'size': 4, 'name': 'descsz', 'type': 'dword'},
   {'size': 4, 'name': 'type', 'type': 'dword'},
   {'size': 4, 'name': 'name', 'type': 'string'}],
  'name': 'NoteGnuProperty_4'},
 'Elf64_Ehdr': {'members': [{'size': 1,
    'name': 'e_ident_magic_num',
    'type': 'byte'},
   {'size': 3, 'name': 'e_ident_magic_str', 'type': 'string'},
   {'size': 1, 'name': 'e_ident_class', 'type': 'byte'},
   {'size': 1, 'name': 'e_ident_data', 'type': 'byte'},
   {'size': 1, 'name': 'e_ident_version', 'type': 'byte'},
   {'size': 1, 'name': 'e_ident_osabi', 'type': 'byte'},
   {'size': 1, 'name': 'e_ident_abiversion', 'type': 'byte'},
   {'size': 7, 'name': 'e_ident_pad', 'type': 'byte[7]'},
   {'size': 2, 'name': 'e_type', 'type': 'word'},
   {'size': 2, 'name': 'e_machine', 'type': 'word'},
   {'size': 4, 'name': 'e_version', 'type': 'dword'},
   {'size': 8, 'name': 'e_entry', 'type': 'qword'},
   {'size': 8, 'name': 'e_phoff', 'type': 'qword'},
   {'size': 8, 'name': 'e_shoff', 'type': 'qword'},
   {'size': 4, 'name': 'e_flags', 'type': 'dword'},
   {'size': 2, 'name': 'e_ehsize', 'type': 'word'},
   {'size': 2, 'name': 'e_phentsize', 'type': 'word'},
   {'size': 2, 'name': 'e_phnum', 'type': 'word'},
   {'size': 2, 'name': 'e_shentsize', 'type': 'word'},
   {'size': 2, 'name': 'e_shnum', 'type': 'word'},
   {'size': 2, 'name': 'e_shstrndx', 'type': 'word'}],
  'name': 'Elf64_Ehdr'},
 'Elf64_Rela': {'members': [{'size': 8, 'name': 'r_offset', 'type': 'qword'},
   {'size': 8, 'name': 'r_info', 'type': 'qword'},
   {'size': 8, 'name': 'r_addend', 'type': 'qword'}],
  'name': 'Elf64_Rela'},
 'evp_pkey_ctx_st': {'members': [], 'name': 'evp_pkey_ctx_st'},
 'group': {'members': [{'size': 8, 'name': 'gr_name', 'type': 'char *'},
   {'size': 8, 'name': 'gr_passwd', 'type': 'char *'},
   {'size': 4, 'name': 'gr_gid', 'type': '__gid_t'},
   {'size': 8, 'name': 'gr_mem', 'type': 'char * *'}],
  'name': 'group'},
 'dirent': {'members': [{'size': 8, 'name': 'd_ino', 'type': '__ino_t'},
   {'size': 8, 'name': 'd_off', 'type': '__off_t'},
   {'size': 2, 'name': 'd_reclen', 'type': 'ushort'},
   {'size': 1, 'name': 'd_type', 'type': 'uchar'},
   {'size': 256, 'name': 'd_name', 'type': 'char[256]'}],
  'name': 'dirent'},
 'fde_table_entry': {'members': [{'size': 4,
    'name': 'initial_loc',
    'type': 'dword'},
   {'size': 4, 'name': 'data_loc', 'type': 'dword'}],
  'name': 'fde_table_entry'},
 'stat': {'members': [{'size': 8, 'name': 'st_dev', 'type': '__dev_t'},
   {'size': 8, 'name': 'st_ino', 'type': '__ino_t'},
   {'size': 8, 'name': 'st_nlink', 'type': '__nlink_t'},
   {'size': 4, 'name': 'st_mode', 'type': '__mode_t'},
   {'size': 4, 'name': 'st_uid', 'type': '__uid_t'},
   {'size': 4, 'name': 'st_gid', 'type': '__gid_t'},
   {'size': 4, 'name': '__pad0', 'type': 'int'},
   {'size': 8, 'name': 'st_rdev', 'type': '__dev_t'},
   {'size': 8, 'name': 'st_size', 'type': '__off_t'},
   {'size': 8, 'name': 'st_blksize', 'type': '__blksize_t'},
   {'size': 8, 'name': 'st_blocks', 'type': '__blkcnt_t'},
   {'size': 16, 'name': 'st_atim', 'type': 'timespec'},
   {'size': 16, 'name': 'st_mtim', 'type': 'timespec'},
   {'size': 16, 'name': 'st_ctim', 'type': 'timespec'},
   {'size': 24, 'name': '__unused', 'type': 'long[3]'}],
  'name': 'stat'},
 'Elf64_Phdr': {'members': [{'size': 4,
    'name': 'p_type',
    'type': 'Elf_ProgramHeaderType'},
   {'size': 4, 'name': 'p_flags', 'type': 'dword'},
   {'size': 8, 'name': 'p_offset', 'type': 'qword'},
   {'size': 8, 'name': 'p_vaddr', 'type': 'qword'},
   {'size': 8, 'name': 'p_paddr', 'type': 'qword'},
   {'size': 8, 'name': 'p_filesz', 'type': 'qword'},
   {'size': 8, 'name': 'p_memsz', 'type': 'qword'},
   {'size': 8, 'name': 'p_align', 'type': 'qword'}],
  'name': 'Elf64_Phdr'},
 '__jmp_buf_tag': {'members': [{'size': 64,
    'name': '__jmpbuf',
    'type': '__jmp_buf'},
   {'size': 4, 'name': '__mask_was_saved', 'type': 'int'},
   {'size': 128, 'name': '__saved_mask', 'type': '__sigset_t'}],
  'name': '__jmp_buf_tag'},
 'sigaction': {'members': [{'size': 8,
    'name': '__sigaction_handler',
    'type': '_union_1457'},
   {'size': 128, 'name': 'sa_mask', 'type': '__sigset_t'},
   {'size': 4, 'name': 'sa_flags', 'type': 'int'},
   {'size': 8, 'name': 'sa_restorer', 'type': '_func_5327 *'}],
  'name': 'sigaction'},
 'NoteAbiTag': {'members': [{'size': 4, 'name': 'namesz', 'type': 'dword'},
   {'size': 4, 'name': 'descsz', 'type': 'dword'},
   {'size': 4, 'name': 'type', 'type': 'dword'},
   {'size': 4, 'name': 'name', 'type': 'string'},
   {'size': 4, 'name': 'abiType', 'type': 'dword'},
   {'size': 12, 'name': 'requiredKernelVersion', 'type': 'dword[3]'}],
  'name': 'NoteAbiTag'},
 'GnuDebugLink_28': {'members': [{'size': 28,
    'name': 'filename',
    'type': 'string'},
   {'size': 4, 'name': 'crc', 'type': 'dword'}],
  'name': 'GnuDebugLink_28'},
 'timespec': {'members': [{'size': 8, 'name': 'tv_sec', 'type': '__time_t'},
   {'size': 8, 'name': 'tv_nsec', 'type': 'long'}],
  'name': 'timespec'},
 'passwd': {'members': [{'size': 8, 'name': 'pw_name', 'type': 'char *'},
   {'size': 8, 'name': 'pw_passwd', 'type': 'char *'},
   {'size': 4, 'name': 'pw_uid', 'type': '__uid_t'},
   {'size': 4, 'name': 'pw_gid', 'type': '__gid_t'},
   {'size': 8, 'name': 'pw_gecos', 'type': 'char *'},
   {'size': 8, 'name': 'pw_dir', 'type': 'char *'},
   {'size': 8, 'name': 'pw_shell', 'type': 'char *'}],
  'name': 'passwd'},
 'tm': {'members': [{'size': 4, 'name': 'tm_sec', 'type': 'int'},
   {'size': 4, 'name': 'tm_min', 'type': 'int'},
   {'size': 4, 'name': 'tm_hour', 'type': 'int'},
   {'size': 4, 'name': 'tm_mday', 'type': 'int'},
   {'size': 4, 'name': 'tm_mon', 'type': 'int'},
   {'size': 4, 'name': 'tm_year', 'type': 'int'},
   {'size': 4, 'name': 'tm_wday', 'type': 'int'},
   {'size': 4, 'name': 'tm_yday', 'type': 'int'},
   {'size': 4, 'name': 'tm_isdst', 'type': 'int'},
   {'size': 8, 'name': 'tm_gmtoff', 'type': 'long'},
   {'size': 8, 'name': 'tm_zone', 'type': 'char *'}],
  'name': 'tm'},
 '__dirstream': {'members': [], 'name': '__dirstream'},
 'Elf64_Sym': {'members': [{'size': 4, 'name': 'st_name', 'type': 'dword'},
   {'size': 1, 'name': 'st_info', 'type': 'byte'},
   {'size': 1, 'name': 'st_other', 'type': 'byte'},
   {'size': 2, 'name': 'st_shndx', 'type': 'word'},
   {'size': 8, 'name': 'st_value', 'type': 'qword'},
   {'size': 8, 'name': 'st_size', 'type': 'qword'}],
  'name': 'Elf64_Sym'}}
casept commented 1 year ago

On further testing, it seems like for whatever reason the response is empty when analyzing Windows targets. Changing to draft until I figure it out.

mahaloz commented 1 year ago

@casept sounds good. If you're unable to figure it out, I'm still willing to merge it, since Linux support is top priority.

casept commented 1 year ago

@mahaloz I've done some more investigation. It seems like the failure is unrelated to the platform, what matters is the number of structs defined in a program. If I impose a limit on how many structs are sent, it works. My guess is that Apache's XMLRPC library has some annoying size limitation hidden somewhere.

I can't find a setting to tweak this anywhere, so I think the only reasonable solution is to paginate the API. That'd need to be done for the IDA implementation as well though, to remain consistent.

casept commented 1 year ago

By the way, I'll also need access to unions and type aliases for my use case.

Would you be willing to accept another PR implementing them for Ghidra without also adding support in the other decompilers?

mahaloz commented 1 year ago

@casept yup that sounds good to me

mahaloz commented 1 year ago

Also, make this PR ready when you are ready for review.

casept commented 1 year ago

Still waiting on your OK for the pagination workaround, then it should be ready.

mahaloz commented 1 year ago

Ah I see. For the pagation workaround it'd be nice if you knew some kind of upperbound of how much can fit in one "page". Just add it to the PR and I'll check it out

casept commented 1 year ago

It's hard to know for sure, as I don't even know what part of the stack exactly imposes the limit (could be the HTTP server or the actual XMLRPC serializer). In my experiments with a fairly "average" program it seems like about 250 structs is the limit. Therefore, I'd propose a limit of 100 per page.

It's getting late here, so I'll polish and push the code tomorrow.

casept commented 1 year ago

Fixed.

Turned out that the cause was completely different than suspected - OOAnalyzer sometimes generates classes with fields that have null names, which XMLRPC doesn't support. I mistook this for a size limitation because these structs were returned quite late. Worked around by generating surrogate names for fields in this case.

I also studied the IDA code more closely - seems like it puts all structures into a struct_info subfield. This behaviour is reflected now.

Should be ready for merge.

mahaloz commented 1 year ago

LGTM!