rizinorg / rizin

UNIX-like reverse engineering framework and command-line toolset.
https://rizin.re
GNU Lesser General Public License v3.0
2.72k stars 363 forks source link

Add a function to provide the exact name of the relocation, show it in Rizin and Cutter outputs for relocations #4641

Open Roeegg2 opened 2 months ago

Roeegg2 commented 2 months ago

Is your feature request related to a problem? Please describe.

Couldn't find a way to change the relocation type naming to the names that readelf outputs (same names which are used in glibc, elf.h, etc) (eg. change ADD_64 relocation to appear as R_X86_64_64)

Describe the solution you'd like

An option on Preferences->Analysis (or whatever place seems fit for this option) to be able to toggle between the default and the classic relocation names. That way it would be easier to read the relocations since their names are aligned to the common naming convention, and the relocations would be more specific (from what I've seen so far, for example SET_64 gets assigned to both R_X86_64_TPOFF64 and R_X86_64_GLOB_DAT) Describe alternatives you've considered

Additional context If this sounds like a good idea I'll be more than happy to add it myself.

Here is an example of what the current naming looks like in Cutter: image And this is naming option I would like to add (not in order, just an example to what the naming looks like):

 R_X86_64_TPOFF64 
 R_X86_64_GLOB_DAT 
 R_X86_64_64
.
.
.
wargio commented 2 months ago

we could introduce e reloc.real or similar to have this sort of output.

Roeegg2 commented 2 months ago

we could introduce e reloc.real or similar to have this sort of output.

What do you mean by e reloc.real?

karliss commented 2 months ago

I don't think this is a simple matter of different naming style. To me it looks like the problem is that underlying rizin data structures do not properly distinguish between different relocation types.

The current code both in Cutter and rizin more or less does format(reloc->additive ? "ADD_%d" : "SET_%d", reloc->size).

But the real set of different relocation types supported by ELF format is much more complex.

For example here is the full table for x86_64 ones image

The set of different relocation types is specific to architecture, and also executable format (PE has different set of relocation types).

And if the name/type isn't even properly handled, I doubt that the actual relocation address calculations are done correctly (at least in all the cases). I am guessing that in the best case it currently handles only the most common relocation types.

wargio commented 2 months ago

what i meant is that we could just have an option to just print R_X86_64_TPOFF64, after all is just a switch

XVilka commented 2 months ago

In any case, we just need something like rz_bin_reloc_as_string() function, that we can use both in Cutter and Rust. And we can remove old string representation completely, I think.

Currently relocations are handled in:

karliss commented 2 months ago

Currently relocations are handled in:

There is also bin_pe.inc, mach0_relocs.c and possibly more for other executable formats.

Roeegg2 commented 2 months ago

I'll start working on it. And by the way since this change should go in rizin, should I open a new issue there?

XVilka commented 2 months ago

Lets just transfer

Roeegg2 commented 2 months ago

I came up with this solution:

break down the printing to 2 parts: the machine prefix, and the actual relocation type suffix (for example break R_X86_64_RELATIVE into R_X86_64 suffix and and RELATIVE) add a machine_str_index and reloc_str_index members to the RzBinReloc type. create 2 arrays of string literals - one for the suffixes and one for the prefixes. create 2 enums for prefix indices and suffix indices.

on reloc_convert() we could assign these 2 indices, and then when printing index into the suffix and prefix arrays to get the final correct string to print.

We could also alternatively use a hash map. I could implement both methods and check which one is faster

wargio commented 2 months ago

i think having the name as by documentation for each format is better, since a user can search its meaning.

karliss commented 2 months ago

No need to complicate things more than necessary with two indexes. The code needs to deal with more than just the ELF files, not every binary format will have the same naming scheme. Space savings from doing a split will be negligible, there will be few dozen relocation types per target platform at most. Even worse having two indexes means that you have replaced the fixed cost of few dozen relocation types with the runtime RAM usage for each relocation and there could be thousands of relocations per executable. Having a split strings also makes searching code more difficult.

And the last thing I am somewhat skeptical about str_index in general (with or without split). Having str_index means that there needs to be single central relocation name string table. But the set of relaction types are specific to architecture/ binary format combination. And rizin can have dynamic plugins for adding support of new binary formats, which is incompatible with the idea of single central string table.

You can probably just have a simple non owned const char* pointing to a string literal.

Roeegg2 commented 2 months ago

Yeah you're right.

Anyway I'll open a PR soon with the changes

Roeegg2 commented 2 months ago

I fixed the ELF relocations, seems to be running well.

I wanted to start working on the other formats, but apparently they aren't even checking for relocation types, just setting the additive field... Shouldn't be a problem to implement though.

Should I PR each format, or PR everything in one chunk?

wargio commented 2 months ago

you can do it in one pr and on commit for format