radareorg / radare2

UNIX-like reverse engineering framework and command-line toolset
https://www.radare.org/
GNU Lesser General Public License v3.0
20.47k stars 2.99k forks source link

JSON output is inconsistent between `p` subcommands #23012

Closed ttufts closed 1 month ago

ttufts commented 4 months ago

Environment

Thu Jun 6 14:02:07 EDT 2024 radare2 5.9.3 32198 @ darwin-arm-64 birth: git.5.9.2-57-g6ad44b9c74 2024-06-05__16:47:07 commit: 6ad44b9c74be729d084319696bd8aca403fb2104 options: gpl -O2 cs:5 cl:2 make Darwin arm64

Description

When parsing json output from pdj and pduoj the json structure isn't the same. pduoj dumps the text of the opcodes but doesn't parse the instructions into a helpful structure like pdj does.

Test

Load an ARM (32 in my case) binary, run pdj 10, run pduoj pop, the pdj output has 'opcode' keys for each instruction, pduoj output only has 'text' keys for each instruction.

trufae commented 4 months ago

Can you make a pr doing the changes to make both commands return the same structure? I agree on the change but will be easier for you to do it because you know whats different. Ill review it and see if that change affects any plugin or tool later and update the tests

thanks!

ttufts commented 4 months ago

This issue is either going to be a hack or a bit of a rewrite. I worked on it a bit yesterday.

The current behavior is that 'pdj' command calls r_core_print_disasm_json, and the 'pduoj' command calls r_core_print_disasm. The two functions handle json very differently. r_core_print_disasm_json fills out the pj structure fully using keywords for each part of the assembly instruction, r_core_print_disasm just grabs the console line and puts it in pj as a 'text' element.

I was changing disasm.c and cmd_print.inc.c so that all of the 'j' subcommands call r_core_print_disasm_json instead of r_core_print_disasm and removed the json arg to r_core_print_disasm.

This seemed to be working until I found that r_core_print_disasm_json doesn't use the ds_ functionality... it's calling r_asm_op_ functions directly rather than the ds_ wrapper functions. ds_disassemble has a much more advanced way to handle disassembled instruction edge cases that r_core_print_disasm_json doesn't have and it seems silly to duplicate it all.

So I see 2 paths forward:

  1. Update r_core_print_disasm_json so that it uses the ds_ functions.
  2. Go back to passing in json into r_core_print_disasm and add the full pj_ functionality to build the full json structure that you have in r_core_print_disasm_json

Make sense?

Do you have a preference for a path forward? I think option 2 is a little more elegant imho. Along with a separate function for adding a new instruction to pj to make it a bit cleaner.

Man, this project is huge. Kudos to you for keeping it up solo for so long. Could use some tech debt cleanup though.

trufae commented 4 months ago

One restriction is that we can't change public structure or function signatures, so having this in mind, both approaches look good, as one solves the problem quickier, so it may work for you sooner, but a proper rewrite/refactoring/cleanup is always welcome, everthing in r2land is prompt to be changed for the good.

So my vote goes for the 2nd approach.

If you need to break APIs you can use the R2_USE_NEW_ABI ifdef, and this code will be swapped before r2-6.0.0 is out (during the 5.9.9 development time)

trufae commented 3 months ago

Ping

trufae commented 2 months ago

@ttufts ping again, please go for the quickiest path to solve the issue, otherwise i have the feeling this issue will be stuck forever. as long as we have tests things can be always improved later. i just cant fix it becuae i lack your context and requirements