Feature request: Heap scanning with data structure detection

blechschmidt commented 8 years ago

As soon as memory scanning is implemented, an additional feature allowing to detect simple data structures would be great.

For example, one could hook all malloc calls using the LD_PRELOAD environment variable in order to detect allocated units and graphically outline this in the memory viewer. Furthermore, if a byte sequence within a block of allocated memory represents a valid heap or stack address, this could be graphically highlighted as a possible pointer.

Thank you for the efforts which you put into this great project.

korcankaraokcu commented 8 years ago

Sure, why not, looks very useful. But this might be implemented at very last phases because I'm planning to finish debugger&code injection engine first in order to give scanmem team more time to develop libscanmem. Also, there are missing features in libscanmem, I'll try to help developing it when this project reaches code scanning phase. My current plan follows as:

setup.py-->refactoring of libPINCE for OOP usage-->basic debugger-->breakpoints-->code injection(single line then code cave injection)-->signal bypassing&Anti anti-debugger tricks in general-->final GUI tweaks/refactoring-->memory scanning-->pointer scanning&this feature

Thank you for the efforts which you put into this great project

Well, someone had to get the boulder rolling :smiley:

sriemer commented 8 years ago

The data of hooking all malloc()s is huge and backtracing takes quite some time. So if you want to do this, then you should know what to look for and filter before backtracing. Otherwise, real-time libs like openGL notice a problem and exit the game. ugtrain already has dynamic memory discovery/hacking/adaption based on malloc() hooking and LD_PRELOAD. It has awesome Chromium B.S.U., Cube 2: Sauerbraten and Warzone 2100 examples based on this.

korcankaraokcu commented 8 years ago

Thanks @sriemer, I'll keep that in mind. Also I have a few concerns about LD_PRELOAD trick. Firstly, you have to restart the game, which is a huge drawback on games that has different state saving mechanisms(some games even disallow you from quitting, check OneShot rpg for instance), we should find a runtime solution for that. Secondly, some games have protected binary loaders and they might detect libraries loaded by LD_PRELOAD easily by checking /proc/$pid/maps for non-trusted paths.

kekeimiku commented 1 year ago

I made a pointer scanner, no need to rely on LD_PRELOAD, debuger and hook, it will not be detected by the game, only need a memory dump file, and then the game does not even need to run. Maybe it will help you: https://github.com/scanmem/scanmem/issues/431

korcankaraokcu commented 1 year ago

@kekeimiku That looks very cool! But integrating it into PINCE is a bit unlikely since it's a direct extension of the scanmem functionality and it feels like it should be integrated into scanmem instead

If you would like to integrate it as a 3rd party tool, maybe we could look into changing PointerSearcher-X output format to PINCE cheat table format so they would be compatible. If you are up for it, I can create a new issue with detailed info on the format for this kind of integration. It's up to you

kekeimiku commented 1 year ago

@korcankaraokcu

The PINCE cheat table doesn't seem to support resolving something like libhello+0x1234 as a base address?

korcankaraokcu commented 1 year ago

PINCE uses gdb in the background for symbol resolving and gdb supports symbols such as function names. You also have to stop the process to use any gdb functionality. PINCE internally uses the gdb API function parse_and_eval to evaluate anything you give it to but apparently it doesn't support resolving shared libraries

More info on the symbols and gdb expressions: https://github.com/korcankaraokcu/PINCE/wiki/About-GDB-Expressions

Maybe the command info sharedlibrary could be used for this purpose. I'd either have to extend examine_expression functionality or create a new function specifically for this purpose. If you would like to implement this on your own without any debugger interference, you can also parse pmap output to find base addresses. Which method would you like to proceed with?

kekeimiku commented 1 year ago

PINCE uses gdb in the background for symbol resolving and gdb supports symbols such as function names. You also have to stop the process to use any gdb functionality. PINCE internally uses the gdb API function parse_and_eval to evaluate anything you give it to but apparently it doesn't support resolving shared libraries

More info on the symbols and gdb expressions: https://github.com/korcankaraokcu/PINCE/wiki/About-GDB-Expressions

Maybe the command info sharedlibrary could be used for this purpose. I'd either have to extend examine_expression functionality or create a new function specifically for this purpose. If you would like to implement this on your own without any debugger interference, you can also parse pmap output to find base addresses. Which method would you like to proceed with?

I think parsing /proc/pid/maps is more efficient. We only need to find the first memory area named xxx with read permission and get its start address.

korcankaraokcu commented 1 year ago

The question was more about which project should implement symbol resolving for shared libraries. But on the second thought, it makes sense for PINCE to have this functionality because otherwise you'd have to launch PointerSearcher everytime to create a new cheat table

I think parsing /proc/pid/maps is more efficient. We only need to find the first memory area named xxx with read permission and get its start address

Yeah I agree, PINCE already uses a package called psutil for parsing this kind of information. It could be done via that. I'll be looking into this soon. Meanwhile, you can work on converting pointer search results into cheat tables. Here's a detailed explanation of the cheat table format:

PINCE stores cheat tables in pct extension. Save button trigger is handled by pushButton_Save_clicked In PINCE.py. It calls read_address_table_recursively which reads the entire table. The function responsible for item conversion is read_address_table_entries. This function serializes items and makes them ready for copying or turning them into a cheat table. This function basically returns a list of description, address_expr, value_type. I'll explain further with an example. Below is a cheat table that contains two pointers:

[["No Description", ["0x561be37b2529", [12]], [2, 10, true, 0], []], ["No Description", ["0x561be37cb604", [4, 32]], [2, 10, true, 0], []]]

Save this as a pct file and load it in PINCE. You can also view it in here for clarity

Both entries have "No Description" as their description. First entry has the base address of "0x561be37b2529" and only one offset, which is 12 (0xC). Second entry has the base address of "0x561be37cb604" and it has two offsets, 4 and 32 in that order. Both entries have the Int32 type which is indicated by [2, 10, true, 0]. You can copy paste this for now, I can also explain it further if you wish. Any questions?

kekeimiku commented 1 year ago

The question was more about which project should implement symbol resolving for shared libraries. But on the second thought, it makes sense for PINCE to have this functionality because otherwise you'd have to launch PointerSearcher everytime to create a new cheat table

I think parsing /proc/pid/maps is more efficient. We only need to find the first memory area named xxx with read permission and get its start address

Yeah I agree, PINCE already uses a package called psutil for parsing this kind of information. It could be done via that. I'll be looking into this soon. Meanwhile, you can work on converting pointer search results into cheat tables. Here's a detailed explanation of the cheat table format:

PINCE stores cheat tables in pct extension. Save button trigger is handled by pushButton_Save_clicked In PINCE.py. It calls read_address_table_recursively which reads the entire table. The function responsible for item conversion is read_address_table_entries. This function serializes items and makes them ready for copying or turning them into a cheat table. This function basically returns a list of description, address_expr, value_type. I'll explain further with an example. Below is a cheat table that contains two pointers:

[["No Description", ["0x561be37b2529", [12]], [2, 10, true, 0], []], ["No Description", ["0x561be37cb604", [4, 32]], [2, 10, true, 0], []]]

Save this as a pct file and load it in PINCE. You can also view it in here for clarity

Both entries have "No Description" as their description. First entry has the base address of "0x561be37b2529" and only one offset, which is 12 (0xC). Second entry has the base address of "0x561be37cb604" and it has two offsets, 4 and 32 in that order. Both entries have the Int32 type which is indicated by [2, 10, true, 0]. You can copy paste this for now, I can also explain it further if you wish. Any questions?

Why is int32 indicated by [2, 10, true, 0]? what other types are indicated by? [2, 10, true, 0], []]] what is the last empty array?

brkzlr commented 1 year ago

Because that first array is the value_type representation in the json format.

The first value in the array is the VALUE_INDEX which you can find in libpince/type_defs.py at line 157.

kekeimiku commented 1 year ago

Because that first array is the value_type representation in the json format.

The first value in the array is the VALUE_INDEX which you can find in libpince/type_defs.py at line 157.

Thx

korcankaraokcu commented 1 year ago

@brkzlr Thanks for the explanation. I'll add a little more information on this

value_index: Type of the value length: Length of the entry, only used if the entry has length, defaults to 10 zero_terminate: Determines if the string is zero terminated, only used for strings value_repr: Representation of the value, can be found in type_defs.py. Determines if the value is being shown as unsigned, signed or hexadecimal

what is the last empty array?

It's the children of the entry. The table has the structure of a tree. The one I sent you is basically a list, so it has no child entries. The table below has an entry that has exactly one child. Load it in PINCE and observe for yourself:

[["No Description", ["0x561be37b2529", [12]], [2, 10, true, 0], []], ["No Description", ["0x561be37cb604", [4, 32]], [2, 10, true, 0], [["No Description", "printf", [2, 10, true, 1], []]]]]

korcankaraokcu commented 1 year ago

@kekeimiku I've realized something about memory pages while working on your request. Not everything is a so file, there are multiple pages with different file extensions. For instance, kwidgetsaddons5_qt.qm. Do you want me to include everything or just so files? Which pages do you exactly search for while searching for pointers?

kekeimiku commented 1 year ago

@kekeimiku I've realized something about memory pages while working on your request. Not everything is a so file, there are multiple pages with different file extensions. For instance, kwidgetsaddons5_qt.qm. Do you want me to include everything or just so files? Which pages do you exactly search for while searching for pointers?

Currently pointer searches only care about regions that have read permission and path does not contain /usr, /dev and meet the following rules [stack] [heap] path is binary path is empty.

For pince, you only need to search the first elf file with the specified name in /proc/pid/maps according to the input, and then get its starting address.

Example: maps

0x200001-0x3000008 r-- /home/aabb/hihihi
...
0x300001-0x4000008 r-- /home/aabb/hello.so
0x4000008-0x3000008 rw- /home/aabb/hello.so

Output of pointersearch hello.so+0x1

It should be parsed as 0x300002. That is 0x300001+0x1

Output of pointersearch hihihi+0x1

It should be parsed as 0x200002. That is 0x200001+0x1

My English is terrible/bad. please feel free to contact me if anything is unclear.

korcankaraokcu commented 1 year ago

My English is terrible/bad. please feel free to contact me if anything is unclear

Your English is very clear, don't worry

path is empty

But, how are we going to reference such region? As I understand, we are going to parse the path and get the library name. If there's no path, how are we supposed to reference it? Did I miss something? Or did you mean to exclude those?

kekeimiku commented 1 year ago

My English is terrible/bad. please feel free to contact me if anything is unclear

Your English is very clear, don't worry

path is empty

But, how are we going to reference such region? As I understand, we are going to parse the path and get the library name. If there's no path, how are we supposed to reference it? Did I miss something? Or did you mean to exclude those?

If there is no path, we can ignore it. Can return an error if an elf named xxx cannot be found.

korcankaraokcu commented 1 year ago

So, do we exclude those rules then? I mean, ignore if [stack] [heap] path is binary path is empty

kekeimiku commented 1 year ago

So, do we exclude those rules then? I mean, ignore if [stack] [heap] path is binary path is empty

We only need areas where the pathname is binary file. others can be ignored.

korcankaraokcu commented 1 year ago

Aight, thanks for clearing it up

kekeimiku commented 1 year ago

How do you feel about doing this in pointersearch, then just call scanmem/pointersearch.

I mean resolve the address of the pointer chain.

kekeimiku commented 1 year ago

Maybe we can move all pointer search related functions to scanmem, pince only needs to focus on scanmem.

korcankaraokcu commented 1 year ago

How do you feel about doing this in pointersearch, then just call scanmem/pointersearch

Users will eventually want to use .so symbols in their scripts, it makes sense for libpince to have this kind of symbol recognition. Don't worry, I'll most likely finish this by tomorrow. I was focused on some visual bugs that I noted in the past but I'm done with them now

korcankaraokcu commented 1 year ago

I've finished it but need to optimize it a bit before releasing, sorry for the delay

korcankaraokcu commented 1 year ago

Aight, I've finished it. Enjoy using this new feature. psutils was a bit slower than I've expected, 30ms on the first call, a bit slow for what it is. I can also parse by myself if this becomes a problem in the future or if we don't use extras of psutils

There's one caveat about this feature. examine_expression handles all of the symbol recognition, this new feature was implemented inside of it because it makes sense design wise. However, examine_expression uses gdb to resolve symbols so you'll have to stop the process in order to use this feature. I'll try to change the behavior of PINCE in the near future to make it usable even when process isn't stopped

kekeimiku commented 1 year ago

How does pince resolve pointer chains? Seems to be different than expected.

For example [["No Description", ["0x7f08fa222050", [0, 24, 16]], [2, 10, true, 0], []]] It is expected that it should read a "ptr1" from "0x7f08fa222050+0", then "ptr2" from "ptr1+24", and finally "ptr2+16" to the target. For example:

proc = OpenProcess(pid)
base_address = 0x7f08fa222050
buf = [0;8] //A 8-byte pointer-sized buf
proc.read(buf, base_address + 0) // read 8 bytes from `base_address + 0`
ptr1 = uint64(buf) // convert 8 bytes of buf to uint64
proc.read(buf, ptr1 + 24) // read 8 bytes from `ptr1 + 24`
ptr2 = uint64(buf) // convert 8 bytes of buf to uint64
target = ptr2 + 16

korcankaraokcu commented 1 year ago

This conversation has been moved to discord to not derail the original subject further

korcankaraokcu / PINCE

Feature request: Heap scanning with data structure detection #15