ChinYikMing commented 7 months ago

First trial of refactoring, the wasm branch's latest commit is the result.

Since state_t is a user-provided data, so all runtime defined value(often change) shall be stored there. For instance, the emulated target program's argc and argv, and the emulator's parameter. The following have been adjusted to reflect the changes:

state_t *state_new(void) -----> state_t *state_new(uint32_t mem_size, int argc, char **argv, bool allow_misalign, bool quiet_output) mem_size is used for memory_new because different runtimes may have memory requirements (for example, the page size in WebAssembly is 64KiB), the default MEM_SIZE(2^32 - 1) is not appropriate for that. The rest of parameters are runtime defined value.
riscv_t *rv_create(const riscv_io_t *io, riscv_user_t userdata, int argc, char **args, bool output_exit_code) -----> riscv_t *rv_create(riscv_user_t userdata) Much cleaner function signature.
void rv_reset(riscv_t *rv, riscv_word_t pc, int argc, char **args) -----> void rv_reset(riscv_t *rv, riscv_word_t pc) We can use rv->userdata to get the required argc and argv.
Since memory I/O handlers are rarely changed, it makes less sense to define them during runtime (makes porting difficult). Instead, I believe it is preferable to link them during build time. If really want to change the implementations, make a new C file and link it during build time might be a better choice. To do this, some changes are made:
- define memory I/O handlers in rv_create and link during build time
- to make memory write interfaces match to compatible with the function pointers, MEM_WRITE_IMPL macro has to be changed:
- [ ] src/io.c
```
// old
#define MEM_WRITE_IMPL(size, type)                                 \
void memory_write_##size(uint32_t addr, const uint8_t *src)    \
{                                                              \
*(type *) (data_memory_base + addr) = *(const type *) src; \
}
```

// new

define MEM_WRITE_IMPL(size, type) \

void memory_write_##size(uint32_t addr, const type src)    \
{                                                              \
    *(type *) (data_memory_base + addr) = src; \
}

- the calling of `memory_write_w` in "src/syscall.c" shall be changed accordingly:
- [ ] `src/syscall.c`
```c
// old
memory_write_w(tv + 0, (const uint8_t *) &tv_s.tv_sec);

// new
memory_write_w(tv + 0, *((const uint32_t *) &tv_s.tv_sec));

For notably change, the "pre.js" of the wasm branch do not define IO on its own anymore compare to first attempt.(more abstraction)

Change all uint32_t and uint16_t and uint8_t in riscv.[ch] to riscv_word_t and riscv_half_t and riscv_byte_t in function signature respectively for consistency.
bool elf_load(elf_t *e, riscv_t *rv, memory_t *mem); -----> bool elf_load(elf_t *e, riscv_t *rv); The memory instance required by elf_load can be accessed via rv's userdata.

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

Related to: #75

jserv commented 7 months ago

I would like to invite @RinHizakura, @qwe661234, and @visitorckw to join the discussion and contribute to the refinement of the API.

jserv commented 7 months ago

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

In the initial stages of developing this emulator, I redirected I/O operations to files for comparison purposes. However, I now recognize that this approach to the function interface was not as flexible as I had initially thought.

ChinYikMing commented 7 months ago

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

In the initial stages of developing this emulator, I redirected I/O operations to files for comparison purposes. However, I now recognize that this approach to the function interface was not as flexible as I had initially thought.

So, it could be useful to provide an abstract way for defining the desired stdin, stdout, stderr, or more than just three of them. stdin, stdout, stderr can be set as default if any spefication of them is not given.

RinHizakura commented 7 months ago

Since memory I/O handlers are rarely changed, it makes less sense to define them during runtime (makes porting difficult). Instead, I believe it is preferable to link them during build time.

I think the distinction between modules is a little bit unclear in the current design of rv32emu. On the current design, if we regard riscv.c as the part of the library and main.c as the part of the application using the library. Although rv_create() seems to allow the application to customize memory operations through io in a pointer manner, the operation on simulated memory actually must be bound to the instance created by state_new(), which is belongs to the library side. This leads to limitations for customizing io. For example, what if you want to use a backup file to simulate memory? This design seems to make memory operations using function pointers io redundant.

I believe it is preferable to link them during build time.

So, if it doesn't matter to provide user-specific operations on memory, providing them on the build time for the library will also be a great solution.

RinHizakura commented 7 months ago

mem_size is used for memory_new because different runtimes may have memory requirements (for example, the page size in WebAssembly is 64KiB), the default MEM_SIZE(2^32 - 1) is not appropriate for that.

Not quite sure about whether changing the memory size directly is safe. As I remember, some implementations of rv32emu intensionally rely on the fact that the memory size is 2^32 - 1 to have some trick. Or maybe I mix up with some project else. Looking for others to answer the question.

ChinYikMing commented 7 months ago

mem_size is used for memory_new because different runtimes may have memory requirements (for example, the page size in WebAssembly is 64KiB), the default MEM_SIZE(2^32 - 1) is not appropriate for that.

Not quite sure about whether changing the memory size directly is safe. As I remember, some implementations of rv32emu intensionally rely on the fact that the memory size is 2^32 - 1 to have some trick. Or maybe I mix up with some project else. Looking for others to answer the question.

The built-in ELF programs do not seem to need a lot of memory so I think 2GB - 4GB is a safe region. Dynamically changing the memory size in different runtime might be needed. For example, 64KiB multiples should be used in WebAssembly. The MEM_SIZE is set to 2^32 originally in #151 as the memory size for preallocating memory to prevent extra checking when manipulating the memory region. Then, MEM_SIZE is set to 2^32 - 1 in #221 to compatible with emcc which default build target is wasm32 ( memory shall < 4GB ).

ChinYikMing commented 7 months ago

I would like to introduce cycles_per_step into state_t structure since it can be varied. For example, in web-based simulation, the user might want to increase the cycles_per_step to jump quicker to the desired part of execution to see the register file or memory bank status. For better abstraction, it could be possible to add a pair of getters and setters.

Then, rv_step signature can be refactored to have only one parameter: void rv_step(riscv_t *rv). The cycles_per_step can be retrieved via rv->userdata

ChinYikMing commented 7 months ago

rv_enables_to_output_exit_code could be renamed as something like rv_get_xxx. Same rules might be applied to other fields of state_t to improve consistency. rv_set_xxx can be the setter.

For example:

rv_get_userdata / rv_set_userdata
rv_get_pc / rv_set_pc
rv_get_reg / rv_set_reg
rv_get_halt_status / rv_set_halt_status
rv_get_cycle_per_step / rv_set_cycle_per_step
rv_get_output_exit_code_flag / rv_set_output_exit_code_flag
rv_get_allow_misalign_flag / rv_set_allow_misalign_flag
...

jserv commented 6 months ago

I would like to introduce cycles_per_step into state_t structure since it can be varied. For example, in web-based simulation, the user might want to increase the cycles_per_step to jump quicker to the desired part of execution to see the register file or memory bank status. For better abstraction, it could be possible to add a pair of getters and setters.

Then, rv_step signature can be refactored to have only one parameter: void rv_step(riscv_t *rv). The cycles_per_step can be retrieved via rv->userdata

I agree. By the way, state_t might not be a very self-explanatory name. I am considering unifying it into a VM-specific structure.

jserv commented 6 months ago

The repository mnurzia/rv serves as an additional reference for API refinement. It features three main APIs:

Memory Access Callback: This function processes data as input/output and returns RV_BAD in case of a fault. It's defined as: typedef rv_res (*rv_bus_cb)(void *user, rv_u32 addr, rv_u8 *data, rv_u32 is_store, rv_u32 width);
CPU Initialization: This function initializes the CPU and can be called again on the cpu object to reset it. The function signature is: void rv_init(rv *cpu, void *user, rv_bus_cb bus_cb);
CPU Single-Step: This function advances the CPU by one step and returns RV_E * in case of an exception. Its definition is: rv_u32 rv_step(rv *cpu);

These APIs collectively provide a structure for memory access, CPU initialization, and step-wise execution in the CPU simulation.

ChinYikMing commented 6 months ago

As previously suggested, the maximum memory (MEM_SIZE) of a virtual machine (VM) shall be determined by the application. If these modifications are made, the Makefile-defined default stack size shall also be adjusted.

Makefile:

# Set the default stack pointer
...
CFLAGS += -D DEFAULT_STACK_ADDR=0xFFFFE000
# Set the default args starting address
CFLAGS += -D DEFAULT_ARGS_ADDR=0xFFFFF000
...

Thus, adjusting stack size should be a part in public API.

ChinYikMing commented 5 months ago

I would like to introduce cycles_per_step into state_t structure since it can be varied. For example, in web-based simulation, the user might want to increase the cycles_per_step to jump quicker to the desired part of execution to see the register file or memory bank status. For better abstraction, it could be possible to add a pair of getters and setters. Then, rv_step signature can be refactored to have only one parameter: void rv_step(riscv_t *rv). The cycles_per_step can be retrieved via rv->userdata

I agree. By the way, state_t might not be a very self-explanatory name. I am considering unifying it into a VM-specific structure.

I think vm_attr_t can be the candidate for the name ( inspired by pthread_attr_t ).

Currently, vm_attr_t should consist of the following:

vm RAM size (if previous concern is OK)
vm STACK size (if vm RAM size changes)
vm-specific argc, argv
error code (to represent the exit state of vm)
enable_outout_exit_code
logging level

union of target ELF program and target vm


union {
rv_struct_t rv_struct;
vm_struct_t vm_struct;
};

typedef struct rv_struct { char *elf_program; } rv_struct_t;

typedef struct vm_struct { kernel_img; dtb; rootfs_img; } vm_struct_t ;


8. cycle_per_step
9. enable_misaligned

I would like to introduce the sixth attribute of `vm_attr_t` which allows the user to select how vm should log, just like `printk`  log level of Linux kernel. This logging level will register corresponding handler during `rv_init` initialization. This feature enable the user has more flexibility to observe the vm state or error reporting. The sixth attribute of `vm_attr_t`  allows to differentiate RISC-V program or RISC-V system emulation, then `rv_create` return a corresponding internal structure (riscv_internal or vm_internal), of course they are forward declaration structure.

Prefix of all vm-related functions should be consistent ( more discussion ).

jserv commented 5 months ago

state_t might not be a very self-explanatory name. I am considering unifying it into a VM-specific structure.

I think vm_attr_t can be the candidate for the name ( inspired by pthread_attr_t ).

It sounds promising. Please send pull request(s) to refine APIs.

Currently, vm_attr_t should consist of the following:

vm RAM size (if previous concern is OK)

vm STACK size (if vm RAM size changes)

vm-specific argc, argv

How about envp?

error code (to represent the exit state of vm)

enable_outout_exit_code

enable_outout_exit_code looks hacky. Can you show something detailed?

ChinYikMing commented 5 months ago

vm RAM size (if previous concern is OK)

vm STACK size (if vm RAM size changes)

vm-specific argc, argv

How about envp?

Since the envp is not accessible for now, place a TODO in vm_attr_t might be decent.

enable_outout_exit_code

enable_outout_exit_code looks hacky. Can you show something detailed?

It is related to syscall_exit to determine whether to output the exit code. I think always output the exit code is not a bad thing, maybe this is redundant. Or, it can be determined on top of logging feature.

jserv commented 5 months ago

I think always output the exit code is not a bad thing, maybe this is redundant. Or, it can be determined on top of logging feature.

After streamlining the API, we can control the exit code by storing it in a specific structure, instead of displaying it directly in the console.

ChinYikMing commented 5 months ago

I am wondering shall we abstract the FILE defined in state_new as a parameter of state_new. Without abstraction, the emulator always depends on standard io(e.g., stdin, stdout, stderr). What if the user want to use a file instead of stdout?

In the initial stages of developing this emulator, I redirected I/O operations to files for comparison purposes. However, I now recognize that this approach to the function interface was not as flexible as I had initially thought.

So, it could be useful to provide an abstract way for defining the desired stdin, stdout, stderr, or more than just three of them. Standard stdin, stdout, stderr can be set as default if any spefication of them is not given.

For abstracting file or file descriptor, we could have an attribute called bool use_default_stdin_stdout_stderr in vm_attr_t which will use common stdin, stdout and stderr. For alternative, we could have a function called vm_register_stdxxx makes the vm to register the non-common fd (e.g., regular file) before emulation.

RinHizakura commented 5 months ago

For abstracting file or file descriptor, we could have an attribute called bool use_default_stdin_stdout_stderr in vm_attr_t which will use common stdin, stdout and stderr. For alternative, we could have a function called vm_register_stdxxx makes the vm to register the non-common fd (e.g., regular file) before emulation.

Since we should also have to maintain the file descriptor if vm_register_stdxxx() for redirection, how about just adding three file descriptors in vm_attr_t:

struct {
    ...
    int stdin;
    int stdout;
    int stderr;
} vm_attr_t;

Note: The variable's name should be refined. I just can't come up with a suitable and concise name now

These file descriptors are assigned to 0, 1, and 2 by default, and modified if vm_register_stdxxx() it. So we don't need a redundant variable bool use_default_stdin_stdout_stderr for this feature.

ChinYikMing commented 5 months ago

For abstracting file or file descriptor, we could have an attribute called bool use_default_stdin_stdout_stderr in vm_attr_t which will use common stdin, stdout and stderr. For alternative, we could have a function called vm_register_stdxxx makes the vm to register the non-common fd (e.g., regular file) before emulation.

Since we should also have to maintain the file descriptor if vm_register_stdxxx() for redirection, how about just adding three file descriptors in vm_attr_t:
struct {
    ...
    int stdin;
    int stdout;
    int stderr;
} vm_attr_t;
Note: The variable's name should be refined. I just can't come up with a suitable and concise name now

These file descriptors are assigned to 0, 1, and 2 by default, and modified if vm_register_stdxxx() it. So we don't need a redundant variable bool use_default_stdin_stdout_stderr for this feature.

Thanks for tips! I think the boolean really redundant since vm_register_stdxxx() could overwrite them.

ChinYikMing commented 5 months ago

Aforementioned that memory I/O handlers are rarely changed, it makes less sense to define them during runtime (see main.c). Instead, I believe it is preferable to link them during build time. If really want to change the implementations, then create a new implementation and link it during build time might be a better choice. Also, from @RinHizakura previous comment, providing I/O handlers could be redundant if no much use cases want to simulate memory on their own. Thus, we might considering to move I/O handlers to library side for simplicity. Or, still providing I/O interface for registration but binding to default I/O handlers when no I/O handlers are specified on creating RISC-V instance .

For further integration of semu, we might also need to abstract the common operations among MMU and no-MMU, e.g., load and store.

ChinYikMing commented 5 months ago

Aforementioned that memory I/O handlers are rarely changed, it makes less sense to define them during runtime (see main.c). Instead, I believe it is preferable to link them during build time. If really want to change the implementations, then create a new implementation and link it during build time might be a better choice. Also, from @RinHizakura previous comment, providing I/O handlers could be redundant if no much use cases want to simulate memory on their own. Thus, we might considering to move I/O handlers to library side for simplicity. Or, still providing I/O interface for registration but binding to default I/O handlers when no I/O handlers are specified on creating RISC-V instance .

For further integration of semu, we might also need to abstract the common operations among MMU and no-MMU, e.g., load and store.

The I/O improvements proposal is as below:

riscv_io_t defined in "riscv.h" can be reused to adapt mmu_fetch, mmu_load, mmu_store in semu
- mmu_fetch signature of semu is compatible with riscv_mem_ifetch by removing the vm and value parameter. The I/O interface is embedded inside riscv_t so vm parameter is no longer needed. The fetched value is returned
- mmu_load signature of semu is compatible with riscv_mem_read_w, riscv_mem_read_s and riscv_mem_read_b by removing the vm, width, value and reserved parameter. The I/O interface is embedded inside riscv_t so vm param is no longer needed. The width parameter is not necessary since there are width related handlers(riscv_mem_read_w, riscv_mem_read_s and riscv_mem_read_b). The loaded value is returned. The registration of the 'reservation set' can be done in corresponding RVOP()(some fields might be added to riscv_t, e.g., reservation) so reserved parameter is no longer needed
- mmu_store is similar to mmu_load
Disable registration of custom I/O handlers for now and the corresponding I/O handlers are set when calling rv_create, thus the rv_create signature becomes riscv_t *rv_create(riscv_user_t attr). We can determine which I/O handlers should be bind by checking attr->data
The peripheral I/O handlers(e.g., UART, PLIC) can be defined riscv_io_t too and they are used by RISC-V system emulator but ignored for RICS-V instructions emulator

jserv commented 4 months ago

mmu_fetch signature of semu is compatible with riscv_mem_ifetch by removing the vm and value parameter. The I/O interface is embedded inside riscv_t so vm parameter is no longer needed. The fetched value is returned

mmu_load signature of semu is compatible with riscv_mem_read_w, riscv_mem_read_s and riscv_mem_read_b by removing the vm, width, value and reserved parameter. The I/O interface is embedded inside riscv_t so vm param is no longer needed. The width parameter is not necessary since there are width related handlers(riscv_mem_read_w, riscv_mem_read_s and riscv_mem_read_b). The loaded value is returned. The registration of the 'reservation set' can be done in corresponding RVOP()(some fields might be added to riscv_t, e.g., reservation) so reserved parameter is no longer needed

mmu_store is similar to mmu_load

The proposal sounds great. I wonder how mmu_{fetch,load,store} can be interoperated with existing structure. Can you show more about function prototypes?

ChinYikMing commented 2 months ago

I wonder how mmu_{fetch,load,store} can be interoperated with existing structure. Can you show more about function prototypes?

Sure.

We have to emulate the peripherals, like MMU, UART and PLIC for minimum requirements to boot Linux.

First of all, we shall support MMU for more resource-management technique in kernel, for example memory sharing or copy-on-write(COW) such that user space programs can call fork system call. In order to support MMU, we can reuse the riscv_io_t inferface for I/O operations. The new function pointer for MMU_{fetch, load, store} might look like this:

typedef struct {
    /* memory read interface */
    riscv_mem_ifetch mem_ifetch;
    riscv_mem_read_w mem_read_w;
    riscv_mem_read_s mem_read_s;
    riscv_mem_read_b mem_read_b;

    /* memory write interface */
    riscv_mem_write_w mem_write_w;
    riscv_mem_write_s mem_write_s;
    riscv_mem_write_b mem_write_b;

    /* TODO: add peripheral I/O interfaces */

+   /* MMU memory helper interface */
+   riscv_mmu_mem_walk mmu_mem_walk;

+   /* MMU memory read interface */
+   riscv_mem_ifetch mmu_mem_ifetch;
+   riscv_mem_read_w mmu_mem_read_w;
+   riscv_mem_read_s mmu_mem_read_s;
+   riscv_mem_read_b mmu_mem_read_b;

+   /* MMU memory write interface */
+   riscv_mem_write_w mmu_mem_write_w;
+   riscv_mem_write_s mmu_mem_write_s;
+   riscv_mem_write_b mmu_mem_write_b;

    /* system */
    riscv_on_ecall on_ecall;
    riscv_on_ebreak on_ebreak;
    riscv_on_memset on_memset;
    riscv_on_memcpy on_memcpy;
} riscv_io_t;

We can decide which function pointer to call during instruction decoding stage since we will know the data width at that time.

mmu_mem_walk is the helper function to walk the 3-level page table(Sv32) with virtual memory and return the corresponding PTE. It's riscv_mmu_mem_walk interface might be like this:

typedef riscv_word_t *(*riscv_mmu_mem_walk)(riscv_word_t addr);

You might notice that non-mmu {fetch, read, write} and mmu {fetch, read, write} are duplicated after this changes. To preserve the mnemonic of the function pointers, we might want to separate them although we could use union to wrap them up to save memory. But, for simplicity, I would like not to use union first.

jserv commented 2 months ago

You might notice that non-mmu {fetch, read, write} and mmu {fetch, read, write} are duplicated after this changes. To preserve the mnemonic of the function pointers, we might want to separate them although we could use union to wrap them up to save memory. But, for simplicity, I would like not to use union first.

The proposed mmumem{read,write}[wsb] are confusing since we already have the ones prefixing with `mem`. Can you avoid such inconsistency?

ChinYikMing commented 2 months ago

You might notice that non-mmu {fetch, read, write} and mmu {fetch, read, write} are duplicated after this changes. To preserve the mnemonic of the function pointers, we might want to separate them although we could use union to wrap them up to save memory. But, for simplicity, I would like not to use union first.

The proposed mmumem{read,write}[wsb] are confusing since we already have the ones prefixing with `mem`. Can you avoid such inconsistency?

What about remove mem_? If so, the proposed would becomes:

typedef struct {
    ...

    /* TODO: add peripheral I/O interfaces */

+   /* MMU memory helper interface */
+   riscv_mmu_mem_walk mmu_walk;

+   /* MMU memory read interface */
+   riscv_mem_ifetch mmu_ifetch;
+   riscv_mem_read_w mmu_read_w;
+   riscv_mem_read_s mmu_read_s;
+   riscv_mem_read_b mmu_read_b;

+   /* MMU memory write interface */
+   riscv_mem_write_w mmu_write_w;
+   riscv_mem_write_s mmu_write_s;
+   riscv_mem_write_b mmu_write_b;

    ...
} riscv_io_t;

jserv commented 2 months ago

What about remove mem_?

Yes, I anticipate removing the legacy memory callback functions prefixed with mem_ in favor of the newly-added MMU counterparts. Additionally, I am considering the possibility of eliminating the mmu_{read,write} callback functions within the definition and registration of riscv_io_t. Can you refine it accordingly?

ChinYikMing commented 2 months ago

What about remove mem_?

Yes, I anticipate removing the legacy memory callback functions prefixed with mem_ in favor of the newly-added MMU counterparts. Additionally, I am considering the possibility of eliminating the mmu_{read,write} callback functions within the definition and registration of riscv_io_t. Can you refine it accordingly?

Originally, I am not intend to make mmu{load,store} registratable, but to reduce the number of parameters that passed to a mmu{load,store} function. However, if we want to eliminate registration for mmu_{load,store} from riscv_io_t, we can declare and define them as static functions within file scope inside "emulate.c" since all instructions implementation will be expanded by RVOP macro. In this way, the function prototype for mmu_{load,store} and helper function might look like this: load:

static riscv_word_t mmu_ifetch(riscv_t *rv, riscv_word_t addr);
static riscv_word_t mmu_read_w(riscv_t *rv, riscv_word_t addr);
static riscv_half_t mmu_read_s(riscv_t *rv, riscv_word_t addr);
static riscv_byte_t mmu_read_b(riscv_t *rv, riscv_word_t addr);

store:

static void mmu_write_w(riscv_t *rv, riscv_word_t addr, riscv_word_t data);
static void mmu_write_s(riscv_t *rv, riscv_word_t addr, riscv_half_t data);
static void mmu_write_b(riscv_t *rv, riscv_word_t addr, riscv_byte_t data);

MMU helper function:

static riscv_word_t *mmu_walk(riscv_t *rv, riscv_word_t addr);

Obviously, we can pass a variable to indicate the width of the data and reduce the number of MMU related functions but I believe one function does one thing well might be a better adoption. Also, they might show more consistency upon the existing riscv_io_t callback functions.

ChinYikMing commented 2 months ago

Additionally, I am considering the possibility of eliminating the mmu_{read,write} callback functions within the definition and registration of riscv_io_t. Can you refine it accordingly?

One thing to notice is that: after the commit 8355777, the I/O interface are binding during initialization, thus no opportunity is given for user registration. Similar situation for mmu_{load, store} callback functions.

jserv commented 2 months ago

Obviously, we can pass a variable to indicate the width of the data and reduce the number of MMU related functions but I believe one function does one thing well might be a better adoption. Also, they might show more consistency upon the existing riscv_io_t callback functions.

Agree. Prior to the refinement of memory operations, I was thinking of Duff's device to unify these functions with various widths. However, we can benefit from compiler optimizations by using specialized functions which are not exposed and would be only hooks during initialization.

ChinYikMing commented 2 months ago

Obviously, we can pass a variable to indicate the width of the data and reduce the number of MMU related functions but I believe one function does one thing well might be a better adoption. Also, they might show more consistency upon the existing riscv_io_t callback functions.

However, we can benefit from compiler optimizations by using specialized functions which are not exposed and would be only hooks during initialization.

Yes, declare MMU related functions using static storage-class-specifier and inline function-specifier has potential to optimize them by inlining them via compiler optimization and do not expose them. Does hooking them at initialization still necessary in this way?

jserv commented 2 months ago

Does hooking them at initialization still necessary in this way?

Not necessary. Let's proceed.

ChinYikMing commented 2 months ago

Since we have ISA and system emulator, it should provide a way to turn on or off the MMU support. There are two ways to do this:

For every memory access, check if a variable rv->mmu_on is set. If yes, consider the address as virtual address.
Pre-select the I/O handlers/callbacks and bind to riscv_io_t interface during the initilization of RISC-V instance.

Obviously, option 2 has lower overhead than option 1. If option 2 is used, the existing riscv_io_t interface remain unchanged but only different handlers/callbacks.

jserv commented 2 months ago

Pre-select the I/O handlers/callbacks and bind to riscv_io_t interface during the initilization of RISC-V instance. Obviously, option 2 has lower overhead than option 1. If option 2 is used, the existing riscv_io_t interface remain unchanged but only different handlers/callbacks.

I prefer option 2. When the MMU is not set by Linux during early boot, which set of memory handlers would be used?

ChinYikMing commented 2 months ago

Pre-select the I/O handlers/callbacks and bind to riscv_io_t interface during the initilization of RISC-V instance. Obviously, option 2 has lower overhead than option 1. If option 2 is used, the existing riscv_io_t interface remain unchanged but only different handlers/callbacks.

I prefer option 2. When the MMU is not set by Linux during early boot, which set of memory handlers would be used?

According to the Sv32 description in RISC-V privileged 20211203 section 4.3, the MODE field of satp CSR determines whether the MMU is on or off. During the early boot, some temporarily kernel mapping setup by kernel function setup_vm should set the MODE off (or Bare mode). For further detail, refer to the comment of source code of kernel function setup_vm which states that the setup_vm is called in MMU-off mode.

In summary, rv32emu can check if MODE and decide whether to translate the address or not. Particularly, we can simply disable translation then read and write data directly from the given address by basic I/O functions defined in io.[ch].

ChinYikMing commented 1 month ago

Exception handling logic is not sequential instruction or in the same currently emulating block, thus we have to refine the exception handler to support block escaping from the currently emulating block. To be simple, we could leverage the setjmp/longjmp to jump to the point of the desired exception handling instruction by PC, then jump back to the jump buffer point after exception handling. In this manner, the exception handling logic is not translated and chained.

This suggestion is to address the problem faced in #438.

sysprog21 / rv32emu

Refactoring RISC-V emulation APIs for easier adoption and porting #310

define MEM_WRITE_IMPL(size, type) \