Open ChinYikMing opened 1 month ago
Can you exploit the prebuilt image files used by semu?
Can you exploit the prebuilt image files used by semu?
Yes, intended. Ultimately, the Image in current build directory will be removed.
Change the description of this pull request, adding some preliminary information for others to build the system emulator and launch Linux kernel.
Consider to use recent clang for static analysis in CI pipeline: (maybe another pull request)
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -139,16 +139,17 @@ jobs:
- name: set up scan-build
run: |
sudo apt-get update -q -y
- sudo apt-get install -q -y clang clang-tools libsdl2-dev libsdl2-mixer-dev
+ sudo apt-get install -q -y libsdl2-dev libsdl2-mixer-dev
wget https://apt.llvm.org/llvm.sh
chmod +x ./llvm.sh
sudo ./llvm.sh 18
+ sudo apt-get install -q -y clang-18 clang-tools-18
shell: bash
- name: run scan-build without JIT
- run: make distclean && scan-build -v -o ~/scan-build --status-bugs --use-cc=clang --force-analyze-debug-code --show-description -analyzer-config stable-report-filename=true -enable-checker valist,nullability make ENABLE_EXT_F=0 ENABLE_SDL=0 ENABLE_JIT=0
+ run: make distclean && scan-build-18 -v -o ~/scan-build --status-bugs --use-cc=clang-18 --force-analyze-debug-code --show-description -analyzer-config stable-report-filename=true -enable-checker valist,nullability make ENABLE_EXT_F=0 ENABLE_SDL=0 ENABLE_JIT=0
- name: run scan-build with JIT
run: |
- make ENABLE_JIT=1 distclean && scan-build -v -o ~/scan-build --status-bugs --use-cc=clang --force-analyze-debug-code --show-description -analyzer-config stable-report-filename=true -enable-checker valist,nullability make ENABLE_EXT_F=0 ENABLE_SDL=0 ENABLE_JIT=1
+ make ENABLE_JIT=1 distclean && scan-build-18 -v -o ~/scan-build --status-bugs --use-cc=clang-18 --force-analyze-debug-code --show-description -analyzer-config stable-report-filename=true -enable-checker valist,nullability make ENABLE_EXT_F=0 ENABLE_SDL=0 ENABLE_JIT=1
compliance-test:
needs: [detect-code-related-file-changes]
Can you exploit the prebuilt image files used by semu?
Yes, intended. Ultimately, the Image in current build directory will be removed.
I have successfully booted the Linux kernel v6.6.59 LTS in a certain branch, although some changes need to be made (e.g., additional SBI extension implementation). I'm considering whether we should maintain a separate blob object, such as the kernel Image, specifically for rv32emu, distinct from the one in semu.
I'm considering whether we should maintain a separate blob object, such as the kernel Image, specifically for rv32emu, distinct from the one in semu.
You can contribute build scripts for both the Linux kernel image and rootfs, initiate the builds, and store the resulting binary blobs in rv32emu-prebuilt.
Build breakage after running make ENABLE_SYSTEM=1
:
make: *** No rule to make target `src/devices/minimal.dts', needed by `build/minimal.dtb'. Stop.
Build breakage after running
make ENABLE_SYSTEM=1
:make: *** No rule to make target `src/devices/minimal.dts', needed by `build/minimal.dtb'. Stop.
I missed to push src/devices/minimal.dts
. Fixed it.
The build with ENABLE_SYSTEM
has been tested on both GNU/Linux and macOS.
However, I rebuilt with ENABLE_SYSTEM=1
and ENABLE_JIT=1
, the segmentation fault raised.
@vacantron, can you check this?
Action items:
scripts/build-img.sh
to tools/build-linux-image.sh
for consistency. Provide documentation as well.However, I rebuilt with ENABLE_SYSTEM=1 and ENABLE_JIT=1, the segmentation fault raised. @vacantron, can you check this?
This problem is related to #511 . Or we can simply commit src/rv32_jit.c
as a temporary workaround (but not in this PR?).
This problem is related to #511 . Or we can simply commit
src/rv32_jit.c
as a temporary workaround (but not in this PR?).
Given the time required to refine the JIT compilation from the current template-based code generator, it would be more practical to focus on modifying the existing T1C first. These changes will help identify potential issues related to system emulation, such as memory allocation errors or other faults.
@jserv How about adding two files in build or test directory to store the versions of buildroot and the Linux kernel? The contents of the files would be as follows:
BUILDROOT_VERSION.txt
:
TAG_OR_BRANCH=2024.05.2
LINUX_VERSION.txt
:
TAG_OR_BRANCH=v6.6.y
The possible change of build-artifact.yaml
:
name: Build artifact
on:
push:
branches:
- master
workflow_dispatch:
jobs:
detect-file-change:
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: 'true'
- name: Test file change
id: test-file-change
uses: tj-actions/changed-files@v45
with:
fetch_additional_submodule_history: 'true'
files: |
mk/artifact.mk
tests/ansibench/**
tests/rv8-bench/**
tests/doom/**
tests/quake/**
tests/scimark2/**
tests/*.c
+ - name: Test file change of system images
+ id: test-system-imgs-change
+ uses: tj-actions/changed-files@v45
+ with:
+ files: |
+ tests/BUILDROOT_VERSION.txt
+ tests/LINUX_VERSION.txt
- name: Set alias
id: has_changed_files
run: |
if [[ ${{ steps.test-file-change.outputs.any_modified }} == true ]]; then
echo "has_changed_files=true" >> $GITHUB_OUTPUT
else
echo "has_changed_files=false" >> $GITHUB_OUTPUT
fi
+ if [[ ${{ steps.test-system-imgs-change.outputs.any_modified }} == true ]]; then
+ echo "has_changed_system_imgs=true" >> $GITHUB_OUTPUT
+ else
+ echo "has_changed_system_img=false" >> $GITHUB_OUTPUT
+ fi
outputs:
has_changed_files: ${{ steps.has_changed_files.outputs.has_changed_files }}
+ has_changed_system_imgs: ${{ steps.has_changed_files.outputs.has_changed_system_imgs }}
+ build-system-artifact:
+ needs: [detect-file-change]
+ if: ${{ needs.detect-file-change.outputs.has_changed_system_imgs == 'true' || github.event_name == 'workflow_dispatch' }}
+ runs-on: ubuntu-22.04
+ steps:
+ - name: Checkout repository
+ uses: actions/checkout@v4
+ with:
+ submodules: 'true'
+ - name: Install dependencies
+ run: |
+ sudo apt-get update -q -y
+ sudo apt-get upgrade -q -y
+ sudo apt-get install build-essential git
+ - name: Build system images
+ run: |
+ make artifact ENABLE_PREBUILT=0 ENABLE_SYSTEM=1
+ ./tools/build-linux-image.sh
+ mkdir -p /tmp/rv32emu-system-prebuilt
+ mv build/Image /tmp/rv32emu-system-prebuilt
+ mv build/rootfs.cpio /tmp/rv32emu-system-prebuilt
+ - name: Create tarball
+ run: |
+ cd /tmp
+ tar -zcvf rv32emu-system-prebuilt.tar.gz rv32emu-system-prebuilt
+ - name: Create GitHub Release
+ env:
+ GH_TOKEN: ${{ secrets.RV32EMU_PREBUILT_TOKEN }}
+ run: |
+ RELEASE_TAG=$(date +'%Y.%m.%d')
+ cd /tmp
+ gh release create $RELEASE_TAG \
+ --repo sysprog21/rv32emu-prebuilt \
+ --title "$RELEASE_TAG""-nightly"
+ gh release upload $RELEASE_TAG \
+ rv32emu-system-prebuilt.tar.gz \
+ sha1sum-system \
+ --repo sysprog21/rv32emu-prebuilt
build-artifact:
needs: [detect-file-change]
if: ${{ needs.detect-file-change.outputs.has_changed_files == 'true' || github.event_name == 'workflow_dispatch' }}
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: 'true'
- name: Install dependencies
run: |
sudo apt-get update -q -y
sudo apt-get upgrade -q -y
sudo apt-get install -q -y gcc-multilib g++-multilib
sudo apt-get install -q -y opam build-essential libgmp-dev z3 pkg-config zlib1g-dev
.ci/riscv-toolchain-install.sh
echo "$PWD/toolchain/bin" >> $GITHUB_PATH
- name: Build binaries
run: |
make artifact ENABLE_PREBUILT=0
mkdir -p /tmp/rv32emu-prebuilt
mv build/sha1sum-linux-x86-softfp /tmp
mv build/sha1sum-riscv32 /tmp
mv build/linux-x86-softfp build/riscv32 /tmp/rv32emu-prebuilt
- name: Build Sail model
run: |
cd /tmp
opam init -y --disable-sandboxing
opam switch create ocaml-base-compiler.4.06.1
opam install sail -y
eval $(opam config env)
git clone https://github.com/riscv/sail-riscv.git
cd sail-riscv
git checkout 9547a30bf84572c458476591b569a95f5232c1c7
ARCH=RV32 make -j
mkdir -p /tmp/rv32emu-prebuilt/sail_cSim
mv c_emulator/riscv_sim_RV32 /tmp/rv32emu-prebuilt/sail_cSim
- name: Create tarball
run: |
cd /tmp
tar -zcvf rv32emu-prebuilt.tar.gz rv32emu-prebuilt
- name: Create GitHub Release
env:
GH_TOKEN: ${{ secrets.RV32EMU_PREBUILT_TOKEN }}
run: |
RELEASE_TAG=$(date +'%Y.%m.%d')
cd /tmp
gh release create $RELEASE_TAG \
--repo sysprog21/rv32emu-prebuilt \
--title "$RELEASE_TAG""-nightly"
gh release upload $RELEASE_TAG \
rv32emu-prebuilt.tar.gz \
sha1sum-linux-x86-softfp \
sha1sum-riscv32 \
--repo sysprog21/rv32emu-prebuilt
The reason for separating the CI file detection rule is that building buildroot and the Linux kernel takes time (on Github runner takes > 1 hr). Therefore, updates to small ELF executables should not trigger a rebuild of the buildroot and Linux kernel.
Note: the sha1sum-system
should be precalculated and upload to rv32emu-prebuilt
.
@jserv How about adding two files in build or test directory to store the versions of buildroot and the Linux kernel? The contents of the files would be as follows:
You can create a file containing the necessary version setting in directory .ci/
.
The reason for separating the CI file detection rule is that building buildroot and the Linux kernel takes time (on Github runner takes > 1 hr). Therefore, updates to small ELF executables should not trigger a rebuild of the buildroot and Linux kernel.
Agree. Can you specify the explicit rules to trigger the builds for Linux kernel and/or rootfs?
@jserv How about adding two files in build or test directory to store the versions of buildroot and the Linux kernel? The contents of the files would be as follows:
You can create a file containing the necessary version setting in directory
.ci/
.
Got it.
The reason for separating the CI file detection rule is that building buildroot and the Linux kernel takes time (on Github runner takes > 1 hr). Therefore, updates to small ELF executables should not trigger a rebuild of the buildroot and Linux kernel.
Agree. Can you specify the explicit rules to trigger the builds for Linux kernel and/or rootfs?
Yes, I will include the CI trigger rules in this PR.
Can you exploit the prebuilt image files used by semu?
Yes, intended. Ultimately, the Image in current build directory will be removed.
Use the released Linux image once it becomes available in rv32emu-prebuilt.
Action items:
- Send pull request to semu for bumping to Linux v6.6.y, which is the latest longterm kernel. You have to make sure SMP configurations work as well. If not, report on semu. Once semu integrates Linux v6.6.y, rework the above build script here.
Let's stick with the Linux v6.1.y in this PR. Bump to v6.6.y in new PR after this.
Clone the branch:
$ git clone https://github.com/ChinYikMing/rv32emu.git -b feat/bring-up-linux --depth 1
Checkout the repo:
$ cd rv32emu
Fetch prebuilt Linux image and run:
$ make system ENABLE_SYSTEM=1 -j8
To exit VM:
CTRL + a + x
Prebuilt Linux image are available now. Please give it a try. The make check
or other CI are broken because the ELF prebuilt tag has not added the suffix "-ELF", shall be confirmed with @vacantron .
Prebuilt Linux image are available now. Please give it a try.
I saw repeated messages as following:
[ 0.076716] remote fence extension is not available in SBI v0.3
Can you clarify this?
By the way, I attempted to run vi
(an applet provided by Busybox), and the emulator crashed.
[ 0.318814] Oops [#1]
[ 0.318816] Modules linked in:
[ 0.318818] CPU: 0 PID: 64 Comm: vi Not tainted 6.1.116 #1
[ 0.318822] Hardware name: rv32emu (DT)
[ 0.318825] epc : strncpy_from_user+0x6c/0x190
[ 0.318829] ra : getname_flags+0x74/0x194
[ 0.318833] epc : c01fb6a8 ra : c00e95a8 sp : c0b07ea0
[ 0.318836] gp : c04da828 tp : c0abb600 t0 : 00000ff0
[ 0.318840] t1 : fefefeff t2 : 6917b420 s0 : c0b07eb0
[ 0.318843] s1 : c0851000 a0 : 00000000 a1 : 00000000
[ 0.318847] a2 : 00000ff0 a3 : 00000000 a4 : 00000000
[ 0.318850] a5 : 00000ff0 a6 : 00000022 a7 : c0851010
[ 0.318854] s2 : c0b07f38 s3 : 00000000 s4 : 00000000
[ 0.318857] s5 : c04db698 s6 : 00000000 s7 : 00000000
[ 0.318860] s8 : 00001000 s9 : 00000002 s10: 00000014
[ 0.318864] s11: ffffffff t3 : 80808080 t4 : 00040000
[ 0.318867] t5 : 00000005 t6 : 00000ff0
[ 0.318870] status: 00040120 badaddr: 00000000 cause: 0000000d
[ 0.318874] [<c01fb6a8>] strncpy_from_user+0x6c/0x190
[ 0.318878] [<c00e95a8>] getname_flags+0x74/0x194
[ 0.318883] [<c00e9718>] getname+0x1c/0x2c
[ 0.318887] [<c00d7f18>] do_sys_openat2+0x4c/0xf0
[ 0.318891] [<c00d80b8>] do_sys_open+0x40/0x58
[ 0.318895] [<c00d8130>] sys_openat+0x24/0x34
[ 0.318899] [<c0002464>] ret_from_syscall+0x0/0x4
[ 0.318903] ---[ end trace 0000000000000000 ]---
[ 0.318935] sh[62]: unhandled signal 11 code 0x1 at 0x00000040 in busybox[69016000+b6000]
[ 0.318942] CPU: 0 PID: 62 Comm: sh Tainted: G D 6.1.116 #1
[ 0.318947] Hardware name: rv32emu (DT)
[ 0.318949] epc : 00000040 ra : 00000040 sp : 9d4df530
[ 0.318953] gp : 690cdd14 tp : 9575d2c0 t0 : 0000000a
[ 0.318956] t1 : 6901d28c t2 : 00000001 s0 : 00000002
[ 0.318960] s1 : ffffffff a0 : fffffff2 a1 : 9d4df520
[ 0.318963] a2 : 9d4df5a0 a3 : 00000006 a4 : 9d4df8e8
[ 0.318967] a5 : 00000011 a6 : 00040000 a7 : 0000005f
[ 0.318970] s2 : 9d4df9dc s3 : 00000000 s4 : 690ce1a0
[ 0.318974] s5 : 00000001 s6 : 690ce1a0 s7 : 690cda60
[ 0.318977] s8 : 0000007f s9 : 00000001 s10: 9d4df9dc
[ 0.318980] s11: 00000004 t3 : 9568afc8 t4 : 00000080
[ 0.318984] t5 : 00000009 t6 : 690b1de8
[ 0.318987] status: 00000020 badaddr: 00000040 cause: 0000000c
Prebuilt Linux image are available now. Please give it a try.
I saw repeated messages as following:
[ 0.076716] remote fence extension is not available in SBI v0.3
Can you clarify this?
I have used a SMP-enabled Linux configuration to build the Linux kernel, thus the remote fence SBI probing is working to enable flushing cache in different core but there is no corresponding SBI implementation currently. Two ways to suppress this:
Nonetheless, the remote fence SBI is an essential future feature for accurately simulating SMP behavior. Also, note that the repeated message appears in semu as well.
By the way, I attempted to run
vi
(an applet provided by Busybox), and the emulator crashed.[ 0.318814] Oops [#1] [ 0.318816] Modules linked in: [ 0.318818] CPU: 0 PID: 64 Comm: vi Not tainted 6.1.116 #1 [ 0.318822] Hardware name: rv32emu (DT) [ 0.318825] epc : strncpy_from_user+0x6c/0x190 [ 0.318829] ra : getname_flags+0x74/0x194 [ 0.318833] epc : c01fb6a8 ra : c00e95a8 sp : c0b07ea0 [ 0.318836] gp : c04da828 tp : c0abb600 t0 : 00000ff0 [ 0.318840] t1 : fefefeff t2 : 6917b420 s0 : c0b07eb0 [ 0.318843] s1 : c0851000 a0 : 00000000 a1 : 00000000 [ 0.318847] a2 : 00000ff0 a3 : 00000000 a4 : 00000000 [ 0.318850] a5 : 00000ff0 a6 : 00000022 a7 : c0851010 [ 0.318854] s2 : c0b07f38 s3 : 00000000 s4 : 00000000 [ 0.318857] s5 : c04db698 s6 : 00000000 s7 : 00000000 [ 0.318860] s8 : 00001000 s9 : 00000002 s10: 00000014 [ 0.318864] s11: ffffffff t3 : 80808080 t4 : 00040000 [ 0.318867] t5 : 00000005 t6 : 00000ff0 [ 0.318870] status: 00040120 badaddr: 00000000 cause: 0000000d [ 0.318874] [<c01fb6a8>] strncpy_from_user+0x6c/0x190 [ 0.318878] [<c00e95a8>] getname_flags+0x74/0x194 [ 0.318883] [<c00e9718>] getname+0x1c/0x2c [ 0.318887] [<c00d7f18>] do_sys_openat2+0x4c/0xf0 [ 0.318891] [<c00d80b8>] do_sys_open+0x40/0x58 [ 0.318895] [<c00d8130>] sys_openat+0x24/0x34 [ 0.318899] [<c0002464>] ret_from_syscall+0x0/0x4 [ 0.318903] ---[ end trace 0000000000000000 ]--- [ 0.318935] sh[62]: unhandled signal 11 code 0x1 at 0x00000040 in busybox[69016000+b6000] [ 0.318942] CPU: 0 PID: 62 Comm: sh Tainted: G D 6.1.116 #1 [ 0.318947] Hardware name: rv32emu (DT) [ 0.318949] epc : 00000040 ra : 00000040 sp : 9d4df530 [ 0.318953] gp : 690cdd14 tp : 9575d2c0 t0 : 0000000a [ 0.318956] t1 : 6901d28c t2 : 00000001 s0 : 00000002 [ 0.318960] s1 : ffffffff a0 : fffffff2 a1 : 9d4df520 [ 0.318963] a2 : 9d4df5a0 a3 : 00000006 a4 : 9d4df8e8 [ 0.318967] a5 : 00000011 a6 : 00040000 a7 : 0000005f [ 0.318970] s2 : 9d4df9dc s3 : 00000000 s4 : 690ce1a0 [ 0.318974] s5 : 00000001 s6 : 690ce1a0 s7 : 690cda60 [ 0.318977] s8 : 0000007f s9 : 00000001 s10: 9d4df9dc [ 0.318980] s11: 00000004 t3 : 9568afc8 t4 : 00000080 [ 0.318984] t5 : 00000009 t6 : 690b1de8 [ 0.318987] status: 00000020 badaddr: 00000040 cause: 0000000c
I have encountered the same issue. But, when using vi xxx
(xxx is some random filename), vi works normally. Try to figure out the root cause.
The rv32emu-prebuilt latest release tag has been added suffix -ELF
, so that all CI tests passed.
After merging this PR, the new release of test benches will automatically have the suffix -ELF
added.
The rv32emu-prebuilt latest release tag has been added suffix
-ELF
, so that all CI tests passed.
Why uppercase -ELF
suffix?
The rv32emu-prebuilt latest release tag has been added suffix
-ELF
, so that all CI tests passed.Why uppercase
-ELF
suffix?
I think it just a typical naming convention when mentioning ELF format, but please let me know if you prefer something different.
Hi! I have observed that random occurrences of segmentation faults, kernel panics, and crashes are happening. It feels like approximately one out of every five or six runs results in one of these issues. The tests were conducted on Commit ab8b756
.
The command I used is:
make system ENABLE_SYSTEM=1 -j8
For multiple tests afterward, I used:
build/rv32emu -k build/linux-image/Image -i build/linux-image/rootfs.cpio -b build/minimal.dtb
Below is one of the kernel panic cases:
[ 0.014183] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 0.014197] Oops [#1]
[ 0.014203] Modules linked in:
[ 0.014210] CPU: 0 PID: 1 Comm: swapper Not tainted 6.1.116 #2
[ 0.014223] Hardware name: rv32emu (DT)
[ 0.014230] epc : __rb_rotate_set_parents+0x0/0x58
[ 0.014242] ra : rb_insert_color+0xc4/0x154
[ 0.014254] epc : c0313b54 ra : c031401c sp : c0861cb0
[ 0.014265] gp : c0476320 tp : c0844000 t0 : c09c9f20
[ 0.014277] t1 : 00000000 t2 : d7a9a567 s0 : c0861cc0
[ 0.014287] s1 : c09c9ec8 a0 : c09c9dd0 a1 : c09c9ed8
[ 0.014298] a2 : c09c9d94 a3 : c09c9ed8 a4 : 00000003
[ 0.014309] a5 : 00000000 a6 : 00000016 a7 : c035b560
[ 0.014320] s2 : 00000000 s3 : c0828034 s4 : c09c9d68
[ 0.014330] s5 : c047600c s6 : 00000000 s7 : 00000000
[ 0.014341] s8 : 00000008 s9 : 00000000 s10: 00000000
[ 0.014352] s11: 00000000 t3 : 00000004 t4 : 00000014
[ 0.014361] t5 : ed55a009 t6 : c09b57e6
[ 0.014369] status: 00000120 badaddr: 00000008 cause: 0000000d
[ 0.014381] [<c0313b54>] __rb_rotate_set_parents+0x0/0x58
[ 0.014394] [<c031401c>] rb_insert_color+0xc4/0x154
[ 0.014408] [<c010a224>] kernfs_link_sibling+0x54/0xf4
[ 0.014421] [<c010b46c>] kernfs_add_one+0x88/0x14c
[ 0.014434] [<c010d110>] __kernfs_create_file+0xb4/0xec
[ 0.014448] [<c010df08>] sysfs_add_file_mode_ns+0xd4/0x124
[ 0.014462] [<c010dfd8>] sysfs_create_file_ns+0x80/0x84
[ 0.014475] [<c01f9a3c>] device_create_file+0x8c/0xac
[ 0.014490] [<c01fd0dc>] device_add+0x41c/0x67c
[ 0.014501] [<c01fd360>] device_register+0x24/0x38
[ 0.014514] [<c01d1174>] tty_register_device_attr+0x174/0x210
[ 0.014528] [<c01d122c>] tty_register_device+0x1c/0x2c
[ 0.014542] [<c01d13a8>] tty_register_driver+0x16c/0x1d0
[ 0.014555] [<c033cf64>] pty_init+0x164/0x3d0
[ 0.014567] [<c000110c>] do_one_initcall+0x6c/0x260
[ 0.014579] [<c032c0ac>] kernel_init_freeable+0x20c/0x210
[ 0.014592] [<c0325a6c>] kernel_init+0x24/0x118
[ 0.014605] [<c00023d0>] ret_from_exception+0x0/0x1c
[ 0.014618] ---[ end trace 0000000000000000 ]---
[ 0.014627] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.014640] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Below is another crash example:
[ 0.026278] Oops - Oops - load address misaligned [#1]
[ 0.026289] Modules linked in:
[ 0.026296] CPU: 0 PID: 6 Comm: kworker/u2:0 Not tainted 6.1.116 #2
[ 0.026310] Hardware name: rv32emu (DT)
[ 0.026318] Workqueue: events_unbound async_run_entry_fn
[ 0.026333] epc : jbd2_journal_dirty_metadata+0x28/0x290
[ 0.026346] ra : __ext4_handle_dirty_metadata+0x90/0x204
[ 0.026359] epc : c0161438 ra : c0114284 sp : c086dc70
[ 0.026370] gp : c0476320 tp : c0845b80 t0 : c0a1b048
[ 0.026382] t1 : 00000003 t2 : 8147ac9e s0 : c086dca0
[ 0.026393] s1 : c086dd68 a0 : 339dc50d a1 : c086dd68
[ 0.026405] a2 : 339dc50d a3 : 00000000 a4 : 61a20000
[ 0.026416] a5 : c082f0d1 a6 : 7c11977b a7 : 3be9185e
[ 0.026427] s2 : 00000000 s3 : 339dc50d s4 : 00000000
[ 0.026438] s5 : c042be78 s6 : c035c3f8 s7 : 000003a0
[ 0.026449] s8 : 00000001 s9 : 0000000b s10: c089d05f
[ 0.026460] s11: 00000000 t3 : c0880014 t4 : c0c18e84
[ 0.026471] t5 : 2771c19e t6 : c088001c
[ 0.026480] status: 00000120 badaddr: 339dc50d cause: 00000004
[ 0.026492] [<c0161438>] jbd2_journal_dirty_metadata+0x28/0x290
[ 0.026506] [<c0114284>] __ext4_handle_dirty_metadata+0x90/0x204
[ 0.026521] [<c012cda4>] ext4_getblk+0x290/0x2a4
[ 0.026534] [<c00b9f94>] path_lookupat+0x60/0x154
[ 0.026547] [<c00bab08>] filename_lookup+0xa0/0xf8
[ 0.026560] [<c00baba0>] kern_path+0x40/0x68
[ 0.026572] [<c03392b4>] init_chown+0x3c/0xa8
[ 0.026585] [<c032cf10>] do_symlink+0x74/0xac
[ 0.026598] [<c032cf88>] write_buffer+0x40/0x64
[ 0.026611] [<c032d85c>] unpack_to_rootfs+0x298/0x2e4
[ 0.026625] [<c032df54>] do_populate_rootfs+0x6c/0xd4
[ 0.026639] [<c002630c>] async_run_entry_fn+0x3c/0xc4
[ 0.026654] [<c001d500>] process_one_work+0x188/0x20c
[ 0.026667] [<c001da04>] worker_thread+0x20c/0x268
[ 0.026680] [<c002341c>] kthread+0xc0/0xc4
[ 0.026693] [<c00023d0>] ret_from_exception+0x0/0x1c
[ 0.026706] ---[ end trace 0000000000000000 ]---
[ 0.033158] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.033467] printk: console [ttyS0] disabled
[ 0.033483] f4000000.serial: ttyS0 at MMIO 0xf4000000 (irq = 1, base_baud = 312500) is a 16550
[ 0.033504] printk: console [ttyS0] enabled
[ 0.033504] printk: console [ttyS0] enabled
[ 0.033521] printk: bootconsole [ns16550] disabled
[ 0.033521] printk: bootconsole [ns16550] disabled
[ 0.033808] clk: Disabling unused clock
Below is another segmentation fault example:
[ 0.065393] Freeing unused kernel image (initmem) memory: 160K
[ 0.065408] Kernel memory protection not selected by kernel config.
[ 0.065422] Run /init as init process
[ 0.074177] ln[29]: unhandled signal 11 code 0x1 at 0x9b779c64 in ld-linux-riscv32-ilp32.so.1[95729000+27000]
[ 0.074207] CPU: 0 PID: 29 Comm: ln Not tainted 6.1.116 #2
[ 0.074222] Hardware name: rv32emu (DT)
[ 0.074232] epc : 9573ec38 ra : 9573e030 sp : 9d230dd0
[ 0.074246] gp : 6915cd14 tp : 957782c0 t0 : 0000000a
[ 0.074259] t1 : 9d230df0 t2 : 00000000 s0 : 9d230e50
[ 0.074274] s1 : 95729ab8 a0 : 00009ed6 a1 : 0000009e
[ 0.074287] a2 : 95729ad0 a3 : 0000000a a4 : 9b779c63
[ 0.074301] a5 : 9ed66737 a6 : 2f2f2f2f a7 : 00000001
[ 0.074315] s2 : 9d25bbe0 s3 : 00000001 s4 : 95752008
[ 0.074328] s5 : 95729000 s6 : 9d25bc7c s7 : 95729000
[ 0.074342] s8 : 95751008 s9 : 95729ac4 s10: 957293ac
[ 0.074356] s11: 0000fff1 t3 : 009ed667 t4 : fffffffc
[ 0.074370] t5 : 00000035 t6 : 0000000b
[ 0.074381] status: 00000020 badaddr: 9b779c64 cause: 0000000f
Segmentation fault (core dumped)
make: *** [mk/system.mk:27: system] Error 139
I have identified several issues here.
A segmentation fault occurs in the mmu_write_b
function, specifically in get_ppn_and_offset
, where the value of pte
can be 0x0
. Since pte
is dereferenced in get_ppn_and_offset
, this causes a segmentation fault, which is also the reason for the "Unable to handle kernel NULL pointer dereference at virtual address 00000040" message.
Also, the assert(insn)
in the block_translate
function fails sporadically, resulting in the message "Unable to handle kernel access to user memory without uaccess routines at virtual address."
Additionally, the program randomly enters an unresponsive state. In such cases, it gets stuck in the following code, and the behavior looks like it gets into an infinite loop:
/* BNE: Branch if Not Equal */
RVOP(
bne,
{ BRANCH_FUNC(uint32_t, ==); },
GEN({
rald2, rs1, rs2;
cmp, VR1, VR0;
break;
setjmpoff;
jcc, 0x85;
cond, branch_untaken;
jmp, pc, 4;
end;
ldimm, TMP, pc, 4;
st, S32, TMP, PC;
exit;
jmpoff;
cond, branch_taken;
jmp, pc, imm;
end;
ldimm, TMP, pc, imm;
st, S32, TMP, PC;
exit;
}))
When stuck in this code, the local variables PC
and cycle
increase in a regular pattern.
Below is the log at the time of the segmentation fault:
[ 0.249663] printk: bootconsole [ns16550] disabled
[ 0.250717] clk: Disabling unused clocks
[ 0.251025] Freeing unused kernel image (initmem) memory: 160K
[ 0.251072] Kernel memory protection not selected by kernel config.
[ 0.251111] Run /init as init process
[ 0.263080] mount[22]: unhandled signal 11 code 0x1 at 0x9b7f6c64 in ld-linux-riscv32-ilp32.so.1[957a6000+27000]
[ 0.263175] CPU: 0 PID: 22 Comm: mount Not tainted 6.1.116 #2
[ 0.263227] Hardware name: rv32emu (DT)
[ 0.263259] epc : 957bbc38 ra : 957bb030 sp : 9d4b3de0
[ 0.263306] gp : 690f1d14 tp : 957282c0 t0 : 0000000a
[ 0.263351] t1 : 9d4b3e00 t2 : 00000000 s0 : 9d4b3e60
[ 0.263397] s1 : 957a6ab8 a0 : 00009ede a1 : 0000009e
[ 0.263442] a2 : 957a6ad0 a3 : 0000000a a4 : 9b7f6c63
[ 0.263487] a5 : 9ede3737 a6 : 2f2f2f2f a7 : 00000001
[ 0.263533] s2 : 9d41bbf0 s3 : 00000001 s4 : 957cf008
[ 0.263578] s5 : 957a6000 s6 : 9d41bc7c s7 : 957a6000
[ 0.263624] s8 : 957ce008 s9 : 957a6ac4 s10: 957a63ac
[ 0.263671] s11: 0000fff1 t3 : 009ede37 t4 : fffffffc
[ 0.263717] t5 : 00000035 t6 : 0000000b
[ 0.263752] status: 00000020 badaddr: 9b7f6c64 cause: 0000000f
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1213562==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x555555598401 bp 0x7ffff38af500 sp 0x7fffffffd780 T0)
==1213562==The signal is caused by a READ memory access.
==1213562==Hint: address points to the zero page.
#0 0x555555598401 in mmu_write_b src/system.c:392
#1 0x5555555750fe in do_sb src/rv32_template.c:639
#2 0x5555555628f9 in rv_step src/emulate.c:1075
#3 0x5555555628f9 in rv_run src/riscv.c:500
#4 0x5555555628f9 in main src/main.c:279
#5 0x7ffff722a1c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#6 0x7ffff722a28a in __libc_start_main_impl ../csu/libc-start.c:360
#7 0x5555555663a4 in _start (/home/mes/MesRepo/Mes-rv32emu/rv32emu/build/rv32emu+0x123a4) (BuildId: e0992c4435c27bffa4166ed19d915866b583f6fc)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV src/system.c:392 in mmu_write_b
==1213562==ABORTING
@Mes0903 Hi, thanks for your several testing, appreciate that! The get_ppn_and_offset
function should work correctly, assuming the PTE is valid at the time it is used ( I might add assertions to ensure the PTE's validity ). However, this assumption does not hold in your test case.
Upon investigation, some page faults are successfully detected and handled by the do_page_fault
function in the kernel. Ideally, this trap handler remaps the PTE if it is absent or performs other VMA-related checks. If something goes wrong, a user-space process might receive a SIGSEGV and terminate for example, while a kernel thread could potentially enter a dead state (refer to die_kernel_fault). Tracing do_page_fault
in greater detail could help.
Clone the branch:
Checkout the repo:
Fetch prebuilt Linux image and run:
To exit VM: