radareorg / radare2-rlang

Writing Radare2 plugins in various languages
16 stars 7 forks source link

limits of Python bindings to write plugins #4

Closed Wenzel closed 3 years ago

Wenzel commented 6 years ago

Hi !

I developed an IO and debug plugins for radare2 in C, and i wanted to port them in Python.

As there was no support to register a debug plugin in Python with r2lang, i forked radare2-bindings and started to implement the required function and interface.

However, i realized that i would not be able to fully translate my C debug plugin to python, because of limitation of the bindings:

For example my __attach implementation requires to access multiple sub-fields of the RDebug * structure, (dbg->iob.io->desc->data), and from this pointer, i need to recreate my RIOVmi *.

static int __attach(RDebug *dbg, int pid) {
    RIODesc *desc = NULL;
    RIOVmi *rio_vmi = NULL;
    status_t status = 0;

    printf("Attaching to pid %d...\n", pid);

    desc = dbg->iob.io->desc;
    rio_vmi = desc->data;

while I'm implementing this in libr/lang/p/python/debug.c, i don't really know how to pass the content of the RDebug * structure:

static int py_debug_attach(RDebug *dbg, int pid) {
    printf("py %s\n", __func__);
        PyObject *arglist = Py_BuildValue (?????);
        PyObject *result = PyEval_CallObject (py_debug_attach_cb, arglist);
}

Which could allow me to implement my debugger and have the same arguments:

import r2lang
def pydebug(_):

    def attach(rdebug, pid):
        print("attaching !")

print("Registering Python debug plugin...")
print(r2lang.plugin("debug", pydebug))

So what is the status of the Python bindings ? What solution do you recommend ?

cc @XVilka

Thanks !

radare commented 6 years ago

Why do you want to reimplement something that works in an inferior language?

On 19 Apr 2018, at 13:06, Mathieu Tarral notifications@github.com wrote:

Hi !

I developed an IO and debug plugins for radare2 in C, and i wanted to port them in Python.

As there was no support to register a debug plugin in Python with r2lang, i forked radare2-bindings and started to implement the required function and interface.

However, i realized that i would not be able to fully translate my C debug plugin to python, because of limitation of the bindings:

For example my __attach implementation requires to access multiple sub-fields of the RDebug structure, (dbg->iob.io->desc->data), and from this pointer, i need to recreate my RIOVmi .

static int __attach(RDebug dbg, int pid) { RIODesc desc = NULL; RIOVmi *rio_vmi = NULL; status_t status = 0;

printf("Attaching to pid %d...\n", pid);

desc = dbg->iob.io->desc;
rio_vmi = desc->data;

while I'm implementing this in libr/lang/p/python/debug.c, i don't really know how to pass the content of the RDebug * structure:

static int py_debug_attach(RDebug dbg, int pid) { printf("py %s\n", func); PyObject arglist = Py_BuildValue (?????); PyObject *result = PyEval_CallObject (py_debug_attach_cb arglist); } Which could allow me to implement my Debug and have the same arguments:

import r2lang def pydebug(_):

def attach(rdebug, pid):
    print("attaching !")

print("Registering Python debug plugin...") print(r2lang.plugin("debug", pydebug)) So what is the status of the Python bindings ? What solution do you recommend ?

cc @XVilka

Thanks !

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

radare commented 6 years ago

The status of the bindings is that ive spent alot of time doing , maintaining because people was asking for them but nobody moved a finger so they are not maintained or tested anymore. There are 4 bindings in python. Choose whatever feels good for u but i still dont see why the rewrite in python

On 19 Apr 2018, at 13:06, Mathieu Tarral notifications@github.com wrote:

Hi !

I developed an IO and debug plugins for radare2 in C, and i wanted to port them in Python.

As there was no support to register a debug plugin in Python with r2lang, i forked radare2-bindings and started to implement the required function and interface.

However, i realized that i would not be able to fully translate my C debug plugin to python, because of limitation of the bindings:

For example my __attach implementation requires to access multiple sub-fields of the RDebug structure, (dbg->iob.io->desc->data), and from this pointer, i need to recreate my RIOVmi .

static int __attach(RDebug dbg, int pid) { RIODesc desc = NULL; RIOVmi *rio_vmi = NULL; status_t status = 0;

printf("Attaching to pid %d...\n", pid);

desc = dbg->iob.io->desc;
rio_vmi = desc->data;

while I'm implementing this in libr/lang/p/python/debug.c, i don't really know how to pass the content of the RDebug * structure:

static int py_debug_attach(RDebug dbg, int pid) { printf("py %s\n", func); PyObject arglist = Py_BuildValue (?????); PyObject *result = PyEval_CallObject (py_debug_attach_cb arglist); } Which could allow me to implement my Debug and have the same arguments:

import r2lang def pydebug(_):

def attach(rdebug, pid):
    print("attaching !")

print("Registering Python debug plugin...") print(r2lang.plugin("debug", pydebug)) So what is the status of the Python bindings ? What solution do you recommend ?

cc @XVilka

Thanks !

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Wenzel commented 6 years ago

Why do you want to reimplement something that works in an inferior language?

The reason i want to rewrite my plugin in Python is because i need to use Rekall APIs to have a semantic translation layer, and be able to interpret raw memory.

My plugin will not be very useful if i cannot detect and understand the memory layout, and find the functions where i need to place breakpoints on-the-fly.

Unfortunately, Rekall is only available in Python.

There are 4 bindings in python.

Can you give me some directions ? Are they all in radare2-bindings repo ?

Thanks.

Wenzel commented 6 years ago

Also, the reason i'm not wrapping this with a r2pipe script is because i need the semantic translation API at the time i'm attaching to a process.

static int __attach(RDebug *dbg, int pid) {
    RIODesc *desc = NULL;
    RIOVmi *rio_vmi = NULL;
    status_t status = 0;

    printf("Attaching to pid %d...\n", pid);

    desc = dbg->iob.io->desc;
    rio_vmi = desc->data;

    // semantic translation
    // rekall.getaddr("PsActiveProcessHead")
radare commented 6 years ago

why not just use the C io plugin you did and make a rekall plugin using r2pipe? if you need rekall to interact with the io plugin then implement commands to call io->cmd to provide the info you need.

Python is slow as hell and brings tons of problems with dynamic linking because people use to have from 4 to 10 different versions of python installed in their systems. also the startup time gets really high when you have python io plugins that depend on other modules. for example if i install the pimp or r2angr plugins, r2 takes like 5 seconds to start in my quadcore 2GHz machine. I think that this is totally incomprensible in 2018.

So i’m not a big fan of doing things in python, i hate lag. And you can just use rekall with r2pipe. only “problem” is when you have to keep a state, but this can be solved with other methods

On 20 Apr 2018, at 12:47, Mathieu Tarral notifications@github.com wrote:

Why do you want to reimplement something that works in an inferior language?

The reason i want to rewrite my plugin in Python is because i need to use Rekall APIs to have a semantic translation layer, and be able to interpret raw memory.

My plugin will not be very useful if i cannot detect and understand the memory layout, and detect the functions where i need to place breakpoints on-the-fly.

Unfortunately, Rekall is only available in Python.

There are 4 bindings in python.

Can you give me some directions ? Are they all in radare2-bindings repo ?

Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radare/radare2-bindings/issues/188#issuecomment-383059259, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-liNZTcxp9L6NdI34JoJYU2q5cKH-ks5tqby5gaJpZM4TblxG.

Wenzel commented 6 years ago

And you can just use rekall with r2pipe

I agree, and I went into that direction last month.

But I realized that I needed to interact with Rekall already in my attach implementation (see my answer above). That's why i started to look into having the plugin in Python.

Wenzel commented 6 years ago

Since Python support seems to be problematic, i will create a script to generate JSON file with the symbols and addresses that i need, and will parse this file in my C plugin in the meantime.

radare commented 6 years ago

so the “problem” is only with the debug plugin, not the io one. because you want to get the address of the process selected using rekall? how do you make rekall use the io plugin? i mean.. isnt it suposed to work only with memory dumps?

On 20 Apr 2018, at 14:55, Mathieu Tarral notifications@github.com wrote:

And you can just use rekall with r2pipe

I agree, and I went into that direction last month.

But I realized that I needed to interact with Rekall already in my attach implementation (see my answer above). That's why i started to look into having the plugin in Python.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radare/radare2-bindings/issues/188#issuecomment-383086716, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lkX7wNHpZNjglI5gmsZ83TSflkpsks5tqdqtgaJpZM4TblxG.

Wenzel commented 6 years ago

so the “problem” is only with the debug plugin, not the io one

that's correct.

because you want to get the address of the process selected using rekall?

I would need a set of symbols:

etc... To place my breakpoints at the right location while the process is created and intercept the first thread.

i mean.. isnt it supposed to work only with memory dumps?

That's the thing. I worked with Michael Cohen to add a new VMI Address Space in Rekall in order to work directly on top of the physical memory, thanks to Libvmi: https://github.com/google/rekall/blob/master/rekall-core/rekall/plugins/addrspaces/vmi.py#L15

This allows me to do memory forensics without having to generate a RAM dump.

And using Rekall API in python, i can extract some symbols on the fly:

    s = session.Session(
            filename="vmi://xen/windows_7",
            autodetect=["rsds"],
            logger=logging.getLogger(),
            autodetect_build_local='none',
            format='data',
            profile_path=[
                "http://profiles.rekall-forensic.com"
])

    strio = StringIO()
    s.RunPlugin("version_modules", output=strio)
    version_modules = json.loads(strio.getvalue())

    pdbase = s.profile.get_obj_offset('_KPROCESS', 'DirectoryTableBase')
    tasks = s.profile.get_obj_offset('_EPROCESS', 'ActiveProcessLinks')
    name = s.profile.get_obj_offset('_EPROCESS', 'ImageFileName')
    pid = s.profile.get_obj_offset('_EPROCESS', 'UniqueProcessId')
Wenzel commented 6 years ago

Oups, i clicked on closed but didn't mean to ...

XVilka commented 6 years ago

I think too that implementing this thing in C is better - from my experience Python plugins are very slow too, the only reason to do them - if you need something superfast or do not know C. But you are right and this bug should be fixed. I will check it the next week once home and back to work.

radare commented 6 years ago

so...

On 20 Apr 2018, at 15:54, Mathieu Tarral notifications@github.com wrote:

so the “problem” is only with the debug plugin, not the io one

that's correct.

because you want to get the address of the process selected using rekall?

I would need a set of symbols:

PsActiveProcessHead kernel address offset of ImageFileName in _EPROCESS struct offset of Win32StartAddress in _ETHREAD struct ntdll!LdrInitializeThunk userspace address etc…

this is a task for rbin, not rdebug or rio. so i would go for just doing a rekall plugin in python. to get all those flags into r2.

Ta place my breakpoints at the right location while the process is created and intercept the first thread.

i mean.. isnt it supposed to work only with memory dumps?

That's the thing. I worked with Michael Cohen to add a new VMI Address Space in Rekall in order to work directly on top of the physical memory, thanks to Libvmi: https://github.com/google/rekall/blob/master/rekall-core/rekall/plugins/addrspaces/vmi.py#L15 https://github.com/google/rekall/blob/master/rekall-core/rekall/plugins/addrspaces/vmi.py#L15 This allows me to do memory forensics without having to generate a RAM dump.

ah thats cool! that may safe some GB of useless dumps in /tmp :P And using Rekall API in python, i can extract some symbols on the fly:

s = session.Session(
        filename="vmi://xen/windows_7",
        autodetect=["rsds"],
        logger=logging.getLogger(),
        autodetect_build_local='none',
        format='data',
        profile_path=[
            "http://profiles.rekall-forensic.com"

])

what i see here is r2 and rekall doing separate connections to the xen machine with vmi. so its not reusing r_io

from an r2pipe script you can read the opened file uri and reuse-it from that script and then set the flags to make r2 autocomplete those addresses. you can set the breakpoints using names instead of offsets then.

another option would be to make this debug plugin instantiate a python vm and run rekall from inside, instead of making the whole plugin in python

strio = StringIO()
s.RunPlugin("version_modules", output=strio)
version_modules = json.loads(strio.getvalue())

pdbase = s.profile.get_obj_offset('_KPROCESS', 'DirectoryTableBase')
tasks = s.profile.get_obj_offset('_EPROCESS', 'ActiveProcessLinks')
name = s.profile.get_obj_offset('_EPROCESS', 'ImageFileName')
pid = s.profile.get_obj_offset('_EPROCESS', 'UniqueProcessId')

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radare/radare2-bindings/issues/188#issuecomment-383103379, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lnMmZTRXPMyN3DQq5flzbUgIEYkmks5tqeh_gaJpZM4TblxG.

Wenzel commented 6 years ago

this is a task for rbin, not rdebug or rio. so i would go for just doing a rekall plugin in python. to get all those flags into r2.

If the rabin is the component responsible for associating a symbol -> address, then this is what i'm looking for.

How i see the new behavior of this debugger:

  1. run r2 -I rabin_plugin.py -d vmi://windows_7:854
  2. (rabin_plugin.py) run rekall and associate a maximum of symbols -> virtual address
  3. (debug_vmi.c:attach()) intercept PID 854, get the address of a certain API and offsets from rabin.
  4. jump at first thread current rip.

from an r2pipe script you can read the opened file uri and reuse-it from that script and then set the flags to make r2 autocomplete those addresses. you can set the breakpoints using names instead of offsets then.

I'm not sure i understood very well what you meant :thinking:

another option would be to make this debug plugin instantiate a python vm and run rekall from inside, instead of making the whole plugin in python

This would imply writing a lot of C code with Python APIs to import rekall and call a function. I wouldn't go this way if there is a better alternative with rabin.

Thanks guys !

radare commented 6 years ago

Try the corebind cmd approach i suggested in telegram

On 23 Apr 2018, at 19:03, Mathieu Tarral notifications@github.com wrote:

this is a task for rbin, not rdebug or rio. so i would go for just doing a rekall plugin in python. to get all those flags into r2.

If the rabin is the component responsible for associating a symbol -> address, then this is what i'm looking for.

Is it loaded before the debugger tries to attach to the process ? Can i call it inside the attach() function too ? Do you have an example of a Rabin python plugin ? How i see the new behavior of this debugger:

run r2 -I rabin_plugin.py -d vmi://windows_7:854 (rabin_plugin.py) run rekall and associate a maximum of symbols -> virtual address (debug_vmi.c:attach()) intercept PID 854, get the address of a certain API and offsets from rabin. jump at first thread current rip. from an r2pipe script you can read the opened file uri and reuse-it from that script and then set the flags to make r2 autocomplete those addresses. you can set the breakpoints using names instead of offsets then.

I'm not sure i understood very well what you meant 🤔

another option would be to make this debug plugin instantiate a python vm and run rekall from inside, instead of making the whole plugin in python

This would imply writing a lot of C code with Python APIs to import rekall and call a function. I wouldn't go this way if there is a better alternative with rabin.

Thanks guys !

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

trufae commented 3 years ago

ping. its been 2 years since this issue was open. Is this issue still happening? can we close it?

thanks @Wenzel !

Wenzel commented 3 years ago

Hi @trufae ,

I will close this issue as i'm not working on r2vli anymore.

Thanks !