ptitSeb / box64

Box64 - Linux Userspace x86_64 Emulator with a twist, targeted at ARM64 Linux devices
https://box86.org
MIT License
3.73k stars 267 forks source link

[REFACTOR] Refactor main.c #1362

Closed Howard0o0 closed 6 months ago

Howard0o0 commented 6 months ago

I am looking to run some Java or Python programs on the ARM64 platform, but they have dependencies on x86_64 dynamic libraries. Currently, box64 seems to only support x86_64 ELF binaries. However, I am hoping to leverage box64's code to enable the JVM or Python interpreter to load and execute these x86_64 dynamic libraries through box64 when calling their methods.

I have identified some possible approaches, but they require using box64 as a dynamic library instead of directly running box64. Before compiling box64 into a dynamic library libbox64.so, the contents of main.c need to be refactored and moved to core.c, allowing both libbox64.so and box64 to share the code from main.c.

The refactoring PR is very straightforward: please diff main.c(commit SHA cd8bda7) with core.c(commit SHA 4dabde2) to see the changes.

It's essentially moving the code from main.c to core.c without substantive code modifications.

Howard0o0 commented 6 months ago

I would appreciate some quick feedback on this PR since the main.c file undergoes frequent changes. :)

ptitSeb commented 6 months ago

Sorry, I'm not ome for the week, so expect slow change and reponses in this period. I took a quick look, it seems a bit extreme but reasonable. I'll give it a better look tonight.

Howard0o0 commented 6 months ago

@ptitSeb Thanks so much for the primary feedback! :D I'll stay patient.

ptitSeb commented 6 months ago

Ok, I'll merge this one, but I find it strange that you need to move everything out of main.c like this. I suppose more refactor of te code will come? because else, Im not sure it's worth the change (a simple #define BULD_ASLIB and a few ` #ifdef| would have been enough)

Howard0o0 commented 6 months ago

That's all the refactoring work so far, because what I need, which will come in next PR, is an API like EXPORT uint64_t RunFuncInEmulator(const char libname, const char funcname, int nargs, ...). Here is the pseudocode:

// Call a function from an x86_64 library on ARM64 Linux.
uint64_t RunFuncInEmulator(const char* libname, const char* funcname, int nargs, ...)
{
    Initialize();

    // Find the symbol from the library name.

    // Call this symbol by RunFunctionWithEmu
}

The conflict is that we do all the initialization work in the main() function, and the API needs to perform the initializations before the actual work. So this refactoring PR's actual purpose is to extract the initialization code from the main() function.

Howard0o0 commented 6 months ago

We could also use a macro BULD_ASLIB :

// main.c

#ifdef BULD_ASLIB
int initialize() {
#else
int main(int argc, const char **argv, char **env) {
#endif

     // initialization logic

#ifdef BULD_ASLIB
}

int main() {
#endif 

#ifndef BULD_ASLIB

    printf_log(LOG_DEBUG, "Start x64emu on Main\n");
    // Stack is ready, with stacked: NULL env NULL argv argc
    SetRIP(emu, my_context->ep);
    ResetFlags(emu);
    PushExit(emu);  // push to pop it just after
    SetRDX(emu, Pop64(emu));    // RDX is exit function
    Run(emu, 0);
    // Get EAX
    int ret = GetEAX(emu);
    printf_log(LOG_DEBUG, "Emulation finished, EAX=%d\n", ret);

// cleaning..

}

#endif

But I think it's not straightforward enough :D

ptitSeb commented 6 months ago

That's all the refactoring work so far, because what I need, which will come in next PR, is an API like EXPORT uint64_t RunFuncInEmulator(const char libname, const char funcname, int nargs, ...). Here is the pseudocode:

// Call a function from an x86_64 library on ARM64 Linux.
uint64_t RunFuncInEmulator(const char* libname, const char* funcname, int nargs, ...)
{
    Initialize();

    // Find the symbol from the library name.

    // Call this symbol by RunFunctionWithEmu
}

The conflict is that we do all the initialization work in the main() function, and the API needs to perform the initializations before the actual work. So this refactoring PR's actual purpose is to extract the initialization code from the main() function.

Yes, agreed, this is the way.

ptitSeb commented 6 months ago

We could also use a macro BULD_ASLIB :

// main.c

#ifdef BULD_ASLIB
int initialize() {
#else
int main(int argc, const char **argv, char **env) {
#endif

     // initialization logic

#ifdef BULD_ASLIB
}

int main() {
#endif 

#ifndef BULD_ASLIB

    printf_log(LOG_DEBUG, "Start x64emu on Main\n");
    // Stack is ready, with stacked: NULL env NULL argv argc
    SetRIP(emu, my_context->ep);
    ResetFlags(emu);
    PushExit(emu);  // push to pop it just after
    SetRDX(emu, Pop64(emu));    // RDX is exit function
    Run(emu, 0);
    // Get EAX
    int ret = GetEAX(emu);
    printf_log(LOG_DEBUG, "Emulation finished, EAX=%d\n", ret);

// cleaning..

}

#endif

But I think it's not straightforward enough :D

Ok, but why also moving the "main" content?

Howard0o0 commented 6 months ago

We could also use a macro BULD_ASLIB :

// main.c

#ifdef BULD_ASLIB
int initialize() {
#else
int main(int argc, const char **argv, char **env) {
#endif

     // initialization logic

#ifdef BULD_ASLIB
}

int main() {
#endif 

#ifndef BULD_ASLIB

    printf_log(LOG_DEBUG, "Start x64emu on Main\n");
    // Stack is ready, with stacked: NULL env NULL argv argc
    SetRIP(emu, my_context->ep);
    ResetFlags(emu);
    PushExit(emu);  // push to pop it just after
    SetRDX(emu, Pop64(emu));    // RDX is exit function
    Run(emu, 0);
    // Get EAX
    int ret = GetEAX(emu);
    printf_log(LOG_DEBUG, "Emulation finished, EAX=%d\n", ret);

// cleaning..

}

#endif

But I think it's not straightforward enough :D

Ok, but why also moving the "main" content?

I'm not sure what scope of the "main" content refered above, does it mean the scope of function int emulate(x64emu_t* emu, elfheader_t* elf_header) in core.c ?

U mean why I moving the "main" content into the function int emulate(x64emu_t* emu, elfheader_t* elf_header) of core.c ? Instead of like this:

// main.c

#include "core.h"

int main(int argc, const char **argv, char **env) {

    x64emu_t* emu = NULL;
    elfheader_t* elf_header = NULL;
    if (initialize(argc, argv, env, &emu, &elf_header, 1)) {
        return -1;
    }

    // get entrypoint
    my_context->ep = GetEntryPoint(my_context->maplib, elf_header);

    atexit(endBox64);
    loadProtectionFromMap();

    // emulate!
    printf_log(LOG_DEBUG, "Start x64emu on Main\n");
    // Stack is ready, with stacked: NULL env NULL argv argc
    SetRIP(emu, my_context->ep);
    ResetFlags(emu);
    Push64(emu, my_context->exit_bridge);  // push to pop it just after
    SetRDX(emu, Pop64(emu));    // RDX is exit function
    Run(emu, 0);
    // Get EAX
    int ret = GetEAX(emu);
    printf_log(LOG_DEBUG, "Emulation finished, EAX=%d\n", ret);
    endBox64();
#ifdef HAVE_TRACE
    if(trace_func)  {
        box_free(trace_func);
        trace_func = NULL;
    }
#endif

    return ret;
}

If u don't want the function int emulate(x64emu_t* emu, elfheader_t* elf_header), I will expand the scope back into main.c , just keep the initialization scope in core.c. Packaging this scope into a function int emulate(x64emu_t* emu, elfheader_t* elf_header) is just my personal habit, trying to make the code more clear :D

Howard0o0 commented 6 months ago

To make the changes of this PR more clear, we could diff old main.c and core.c:

image

Just spliting the code of main.c into 2 parts, initialize() and emulate().