rizinorg / rizin

UNIX-like reverse engineering framework and command-line toolset.
https://rizin.re
GNU Lesser General Public License v3.0
2.66k stars 357 forks source link

Autodetect `asm.cpu` whenever possible #3747

Open XVilka opened 1 year ago

XVilka commented 1 year ago

It is common to have ELF for ARM Cortex-M profile but it's not shown in the ELF header:

iorw     false
block    0x100
type     EXEC (Executable file)
arch     arm
cpu      N/A
baddr    0x342a0000
binsz    0x00d01f0b
bintype  elf
bits     32
class    ELF32
compiler GCC: 
dbg_file N/A
endian   LE
hdr.csum N/A
guid     N/A
intrp    N/A
laddr    0x00000000
lang     c++
machine  ARM
maxopsz  4
minopsz  2
os       linux
cc       N/A
pcalign  2
rpath    NONE

But the CPU profile can affect analysis drastically in the case of ARM Cortex-M, for example, because of additional instructions, and being Thumb, it has some effect on the sequence of disassembly.

We should figure out a way to detect Cortex-M ELFs whenever possible. Currently you have to specify it from command line:

$ rizin -A -e asm.cpu=cortexm firmware.elf

Would be nice to autodetect cortexm/cortexa profiles whenever possible.

Quite often compilers add a special section .ARM.attributes that has that information (note the Tag_CPU_arch_profile and Tag_CPU_arch attributes):

> readelf -A cortex-a8.out                                                       
Attribute Section: aeabi
File Attributes
  Tag_conformance: "2.10"
  Tag_CPU_arch: v7
  Tag_CPU_arch_profile: Application
  Tag_ARM_ISA_use: Yes
  Tag_THUMB_ISA_use: Thumb-2
  Tag_PCS_config: Bare platform
  Tag_ABI_align_needed: 8-byte
  Tag_ABI_align_preserved: 8-byte, except leaf SP
  Tag_ABI_enum_size: small
  Tag_ABI_VFP_args: compatible
  Tag_CPU_unaligned_access: v6
  Tag_DIV_use: Not allowed

 > readelf -A cortex-m33.out
Attribute Section: aeabi
File Attributes
  Tag_conformance: "2.10"
  Tag_CPU_arch: v8-M.mainline
  Tag_CPU_arch_profile: Microcontroller
  Tag_THUMB_ISA_use: Yes
  Tag_FP_arch: FPv5/FP-D16 for ARMv8
  Tag_PCS_config: Bare platform
  Tag_ABI_align_needed: 8-byte
  Tag_ABI_align_preserved: 8-byte, except leaf SP
  Tag_ABI_enum_size: forced to int
  Tag_ABI_HardFP_use: SP only
  Tag_ABI_VFP_args: compatible
  Tag_CPU_unaligned_access: v6
  Tag_DIV_use: Not allowed

See https://stackoverflow.com/questions/70071681/how-can-i-know-if-an-elf-file-is-for-cortex-a-or-cortex-m for more information

It should be changed somewhere probably in librz/bin/format/elf/.

See file librz/bin/format/elf/elf_info.c and get_cpu_mips() function as an example.

valdaarhun commented 8 months ago

Hi. I would like to work on this issue. I think I have got an idea on how to resolve this.

valdaarhun commented 7 months ago

Quite often compilers add a special section .ARM.attributes that has that information (note the Tag_CPU_arch_profile and Tag_CPU_arch attributes)

Hi. Just to be clear, is our intention to simply recognize the cpu profile (eg: A, M, R, etc) or the specific processor family (eg: cortex, neoverse, etc.) that the elf is expected to run on?

Based on what I have understood after reading through ARM's addenda to their ABI and this wikipedia page on the list of ARM processors, it's quite clear that the "M" profile implies the cortex-m processor family or a similar family (like SecurCore) which shares the same features.

However, the "A" cpu profile could imply the cortex-a family or the neoverse family.

I noticed the following struct in librz/asm/p/asm_arm_cs.c:

RzAsmPlugin rz_asm_plugin_arm_cs = {
    .name = "arm",
    .desc = "Capstone ARM disassembler",
    .cpus = "v8,cortexm,arm1176,cortexA72,cortexA8",
    .platforms = "bcm2835,omap3430",
    .features = "v8",
    .license = "BSD",
    .arch = "arm",
    .bits = 16 | 32 | 64,
    .endian = RZ_SYS_ENDIAN_LITTLE | RZ_SYS_ENDIAN_BIG,
    .disassemble = &disassemble,
        ...
}

The cpus field is hard coded to a specific processor (eg: cortexA8) or a family (eg: cortexm). How do I go about dealing with other families such as Neoverse?

XVilka commented 7 months ago

@valdaarhun for now detecting profile is enough, but since Rizin ARM decoding is based on Capstone, only those make sense for autodetection (https://github.com/capstone-engine/capstone/blob/next/include/capstone/arm.h#L1638):

// Architecture-specific groups
    // generated content <ARMGenCSFeatureEnum.inc> begin
    // clang-format off

    ARM_FEATURE_IsARM = 128,
    ARM_FEATURE_HasV5T,
    ARM_FEATURE_HasV4T,
    ARM_FEATURE_HasVFP2,
    ARM_FEATURE_HasV5TE,
    ARM_FEATURE_HasV6T2,
    ARM_FEATURE_HasMVEInt,
    ARM_FEATURE_HasNEON,
    ARM_FEATURE_HasFPRegs64,
    ARM_FEATURE_HasFPRegs,
    ARM_FEATURE_IsThumb2,
    ARM_FEATURE_HasV8_1MMainline,
    ARM_FEATURE_HasLOB,
    ARM_FEATURE_IsThumb,
    ARM_FEATURE_HasV8MBaseline,
    ARM_FEATURE_Has8MSecExt,
    ARM_FEATURE_HasV8,
    ARM_FEATURE_HasAES,
    ARM_FEATURE_HasBF16,
    ARM_FEATURE_HasCDE,
    ARM_FEATURE_PreV8,
    ARM_FEATURE_HasV6K,
    ARM_FEATURE_HasCRC,
    ARM_FEATURE_HasV7,
    ARM_FEATURE_HasDB,
    ARM_FEATURE_HasVirtualization,
    ARM_FEATURE_HasVFP3,
    ARM_FEATURE_HasDPVFP,
    ARM_FEATURE_HasFullFP16,
    ARM_FEATURE_HasV6,
    ARM_FEATURE_HasAcquireRelease,
    ARM_FEATURE_HasV7Clrex,
    ARM_FEATURE_HasMVEFloat,
    ARM_FEATURE_HasFPRegsV8_1M,
    ARM_FEATURE_HasMP,
    ARM_FEATURE_HasSB,
    ARM_FEATURE_HasDivideInARM,
    ARM_FEATURE_HasV8_1a,
    ARM_FEATURE_HasSHA2,
    ARM_FEATURE_HasTrustZone,
    ARM_FEATURE_UseNaClTrap,
    ARM_FEATURE_HasV8_4a,
    ARM_FEATURE_HasV8_3a,
    ARM_FEATURE_HasFPARMv8,
    ARM_FEATURE_HasFP16,
    ARM_FEATURE_HasVFP4,
    ARM_FEATURE_HasFP16FML,
    ARM_FEATURE_HasFPRegs16,
    ARM_FEATURE_HasV8MMainline,
    ARM_FEATURE_HasDotProd,
    ARM_FEATURE_HasMatMulInt8,
    ARM_FEATURE_IsMClass,
    ARM_FEATURE_HasPACBTI,
    ARM_FEATURE_IsNotMClass,
    ARM_FEATURE_HasDSP,
    ARM_FEATURE_HasDivideInThumb,
    ARM_FEATURE_HasV6M,

As rizin doesn't have a way to select particular features, only CPUs with sets of particular features are possible for now.

cc @Rot127

XVilka commented 7 months ago

@valdaarhun if you check disasssemble() function in the librz/asm/p/asm_arm_cs. you will see that only CS_MODE_MCLASS and CS_MODE_V8 are used. Thus, it's fine to detect just those for now.

valdaarhun commented 7 months ago

I see. In that case, I'll just focus on these two classes.

valdaarhun commented 6 months ago

Hi. The functions get_cpu_mips or get_cpu_arm in librz/bin/format/elf/elf_info.c simply print the cpu name. How do I get rizin to actually make sense of it before disassembly?

In librz/arch/p/asm_arm_cs:disassemble(), it checks the value of a->cpu. I am guessing it needs to figure out a way to set a->cpu to "cortexm" or "v8". But where is this actually set?

When rizin is run with -e asm.cpu=cortexm, it calls rz_config_eval(). I think this sets the value in r->config. Should I use the same/similar approach in get_cpu_arm()?

XVilka commented 6 months ago

Hmm, I thought this value is used somewhere, my bad. Ok, you need to pass it to the config somehow, yes. It's probably should be done somewhere in librz/core/cbin.c

valdaarhun commented 6 months ago

Thank you for your response. I'll take a look at cbin.c.

Rot127 commented 6 months ago

@valdaarhun Sorry, I missed the mention above from @XVilka. It's fine, if for now it can only check for armv8 or the M-profile. Although, please ensure it is easily extendible. So when we add toggles for all the other CPU features (e.g. see list above), it takes only minimal effort. In the best case implement your solution only for armv8 and add coretx-m toggle afterwards. So you can check if it is actually easy to add a feature.