sysprog21 / rv32emu

Compact and Efficient RISC-V RV32I[MAFC] emulator
MIT License
402 stars 99 forks source link

Bring up Linux kernel #508

Open ChinYikMing opened 1 month ago

ChinYikMing commented 1 month ago

Clone the branch:

$ git clone https://github.com/ChinYikMing/rv32emu.git -b feat/bring-up-linux --depth 1

Checkout the repo:

$ cd rv32emu

Fetch prebuilt Linux image and run:

$ make system ENABLE_SYSTEM=1 -j8

To exit VM:

CTRL + a + x
jserv commented 1 month ago

Can you exploit the prebuilt image files used by semu?

ChinYikMing commented 1 month ago

Can you exploit the prebuilt image files used by semu?

Yes, intended. Ultimately, the Image in current build directory will be removed.

jserv commented 1 month ago

Change the description of this pull request, adding some preliminary information for others to build the system emulator and launch Linux kernel.

jserv commented 4 weeks ago

Consider to use recent clang for static analysis in CI pipeline: (maybe another pull request)

--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -139,16 +139,17 @@ jobs:
     - name: set up scan-build
       run: |
             sudo apt-get update -q -y
-            sudo apt-get install -q -y clang clang-tools libsdl2-dev libsdl2-mixer-dev
+            sudo apt-get install -q -y libsdl2-dev libsdl2-mixer-dev
             wget https://apt.llvm.org/llvm.sh
             chmod +x ./llvm.sh
             sudo ./llvm.sh 18
+            sudo apt-get install -q -y clang-18 clang-tools-18
       shell: bash
     - name: run scan-build without JIT
-      run: make distclean && scan-build -v -o ~/scan-build --status-bugs --use-cc=clang --force-analyze-debug-code --show-description -analyzer-config stable-report-filename=true -enable-checker valist,nullability make ENABLE_EXT_F=0 ENABLE_SDL=0 ENABLE_JIT=0
+      run: make distclean && scan-build-18 -v -o ~/scan-build --status-bugs --use-cc=clang-18 --force-analyze-debug-code --show-description -analyzer-config stable-report-filename=true -enable-checker valist,nullability make ENABLE_EXT_F=0 ENABLE_SDL=0 ENABLE_JIT=0
     - name: run scan-build with JIT
       run: |
-          make ENABLE_JIT=1 distclean && scan-build -v -o ~/scan-build --status-bugs --use-cc=clang --force-analyze-debug-code --show-description -analyzer-config stable-report-filename=true -enable-checker valist,nullability make ENABLE_EXT_F=0 ENABLE_SDL=0 ENABLE_JIT=1
+          make ENABLE_JIT=1 distclean && scan-build-18 -v -o ~/scan-build --status-bugs --use-cc=clang-18 --force-analyze-debug-code --show-description -analyzer-config stable-report-filename=true -enable-checker valist,nullability make ENABLE_EXT_F=0 ENABLE_SDL=0 ENABLE_JIT=1

   compliance-test:
     needs: [detect-code-related-file-changes]
ChinYikMing commented 3 weeks ago

Can you exploit the prebuilt image files used by semu?

Yes, intended. Ultimately, the Image in current build directory will be removed.

I have successfully booted the Linux kernel v6.6.59 LTS in a certain branch, although some changes need to be made (e.g., additional SBI extension implementation). I'm considering whether we should maintain a separate blob object, such as the kernel Image, specifically for rv32emu, distinct from the one in semu.

jserv commented 3 weeks ago

I'm considering whether we should maintain a separate blob object, such as the kernel Image, specifically for rv32emu, distinct from the one in semu.

You can contribute build scripts for both the Linux kernel image and rootfs, initiate the builds, and store the resulting binary blobs in rv32emu-prebuilt.

jserv commented 3 weeks ago

Build breakage after running make ENABLE_SYSTEM=1:

make: *** No rule to make target `src/devices/minimal.dts', needed by `build/minimal.dtb'.  Stop.
ChinYikMing commented 3 weeks ago

Build breakage after running make ENABLE_SYSTEM=1:

make: *** No rule to make target `src/devices/minimal.dts', needed by `build/minimal.dtb'.  Stop.

I missed to push src/devices/minimal.dts. Fixed it.

jserv commented 3 weeks ago

The build with ENABLE_SYSTEM has been tested on both GNU/Linux and macOS.

However, I rebuilt with ENABLE_SYSTEM=1 and ENABLE_JIT=1, the segmentation fault raised. @vacantron, can you check this?

jserv commented 3 weeks ago

Action items:

vacantron commented 2 weeks ago

However, I rebuilt with ENABLE_SYSTEM=1 and ENABLE_JIT=1, the segmentation fault raised. @vacantron, can you check this?

This problem is related to #511 . Or we can simply commit src/rv32_jit.c as a temporary workaround (but not in this PR?).

jserv commented 2 weeks ago

This problem is related to #511 . Or we can simply commit src/rv32_jit.c as a temporary workaround (but not in this PR?).

Given the time required to refine the JIT compilation from the current template-based code generator, it would be more practical to focus on modifying the existing T1C first. These changes will help identify potential issues related to system emulation, such as memory allocation errors or other faults.

ChinYikMing commented 2 weeks ago

@jserv How about adding two files in build or test directory to store the versions of buildroot and the Linux kernel? The contents of the files would be as follows:

BUILDROOT_VERSION.txt :

TAG_OR_BRANCH=2024.05.2

LINUX_VERSION.txt :

TAG_OR_BRANCH=v6.6.y

The possible change of build-artifact.yaml:

name: Build artifact

on:
  push:
    branches:
      - master
  workflow_dispatch:

jobs:
  detect-file-change:
    runs-on: ubuntu-22.04
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          submodules: 'true'
      - name: Test file change
        id: test-file-change
        uses: tj-actions/changed-files@v45
        with:
          fetch_additional_submodule_history: 'true'
          files: |
            mk/artifact.mk
            tests/ansibench/**
            tests/rv8-bench/**
            tests/doom/**
            tests/quake/**
            tests/scimark2/**
            tests/*.c
+     - name: Test file change of system images
+       id: test-system-imgs-change
+       uses: tj-actions/changed-files@v45
+       with:
+         files: |
+           tests/BUILDROOT_VERSION.txt
+           tests/LINUX_VERSION.txt
      - name: Set alias
        id: has_changed_files
        run: |
          if [[ ${{ steps.test-file-change.outputs.any_modified }} == true ]]; then
            echo "has_changed_files=true" >> $GITHUB_OUTPUT
          else
            echo "has_changed_files=false" >> $GITHUB_OUTPUT
          fi
+         if [[ ${{ steps.test-system-imgs-change.outputs.any_modified }} == true ]]; then
+           echo "has_changed_system_imgs=true" >> $GITHUB_OUTPUT
+         else
+           echo "has_changed_system_img=false" >> $GITHUB_OUTPUT
+         fi
    outputs:
      has_changed_files: ${{ steps.has_changed_files.outputs.has_changed_files }}
+     has_changed_system_imgs: ${{ steps.has_changed_files.outputs.has_changed_system_imgs }}

+ build-system-artifact:
+   needs: [detect-file-change]
+   if: ${{ needs.detect-file-change.outputs.has_changed_system_imgs == 'true' || github.event_name == 'workflow_dispatch' }}
+   runs-on: ubuntu-22.04
+   steps:
+     - name: Checkout repository
+       uses: actions/checkout@v4
+       with:
+         submodules: 'true'
+     - name: Install dependencies
+       run: |
+         sudo apt-get update -q -y
+         sudo apt-get upgrade -q -y
+         sudo apt-get install build-essential git
+     - name: Build system images
+       run: |
+         make artifact ENABLE_PREBUILT=0 ENABLE_SYSTEM=1
+         ./tools/build-linux-image.sh
+         mkdir -p /tmp/rv32emu-system-prebuilt
+         mv build/Image /tmp/rv32emu-system-prebuilt
+         mv build/rootfs.cpio /tmp/rv32emu-system-prebuilt
+     - name: Create tarball
+       run: |
+         cd /tmp
+         tar -zcvf rv32emu-system-prebuilt.tar.gz rv32emu-system-prebuilt
+     - name: Create GitHub Release
+       env:
+         GH_TOKEN: ${{ secrets.RV32EMU_PREBUILT_TOKEN }}
+       run: |
+         RELEASE_TAG=$(date +'%Y.%m.%d')
+         cd /tmp
+         gh release create $RELEASE_TAG \
+           --repo sysprog21/rv32emu-prebuilt \
+           --title "$RELEASE_TAG""-nightly"
+         gh release upload $RELEASE_TAG \
+           rv32emu-system-prebuilt.tar.gz \
+           sha1sum-system \
+           --repo sysprog21/rv32emu-prebuilt

  build-artifact:
    needs: [detect-file-change]
    if: ${{ needs.detect-file-change.outputs.has_changed_files == 'true' || github.event_name == 'workflow_dispatch' }}
    runs-on: ubuntu-22.04
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          submodules: 'true'
      - name: Install dependencies
        run: |
          sudo apt-get update -q -y
          sudo apt-get upgrade -q -y
          sudo apt-get install -q -y gcc-multilib g++-multilib
          sudo apt-get install -q -y opam build-essential libgmp-dev z3 pkg-config zlib1g-dev
          .ci/riscv-toolchain-install.sh
          echo "$PWD/toolchain/bin" >> $GITHUB_PATH
      - name: Build binaries
        run: |
          make artifact ENABLE_PREBUILT=0
          mkdir -p /tmp/rv32emu-prebuilt
          mv build/sha1sum-linux-x86-softfp /tmp
          mv build/sha1sum-riscv32 /tmp
          mv build/linux-x86-softfp build/riscv32 /tmp/rv32emu-prebuilt
      - name: Build Sail model
        run: |
          cd /tmp
          opam init -y --disable-sandboxing
          opam switch create ocaml-base-compiler.4.06.1
          opam install sail -y
          eval $(opam config env)
          git clone https://github.com/riscv/sail-riscv.git
          cd sail-riscv
          git checkout 9547a30bf84572c458476591b569a95f5232c1c7
          ARCH=RV32 make -j
          mkdir -p /tmp/rv32emu-prebuilt/sail_cSim
          mv c_emulator/riscv_sim_RV32 /tmp/rv32emu-prebuilt/sail_cSim
      - name: Create tarball
        run: |
          cd /tmp
          tar -zcvf rv32emu-prebuilt.tar.gz rv32emu-prebuilt
      - name: Create GitHub Release
        env:
          GH_TOKEN: ${{ secrets.RV32EMU_PREBUILT_TOKEN }}
        run: |
          RELEASE_TAG=$(date +'%Y.%m.%d')
          cd /tmp
          gh release create $RELEASE_TAG \
            --repo sysprog21/rv32emu-prebuilt \
            --title "$RELEASE_TAG""-nightly"
          gh release upload $RELEASE_TAG \
            rv32emu-prebuilt.tar.gz \
            sha1sum-linux-x86-softfp \
            sha1sum-riscv32 \
            --repo sysprog21/rv32emu-prebuilt

The reason for separating the CI file detection rule is that building buildroot and the Linux kernel takes time (on Github runner takes > 1 hr). Therefore, updates to small ELF executables should not trigger a rebuild of the buildroot and Linux kernel.

Note: the sha1sum-system should be precalculated and upload to rv32emu-prebuilt.

jserv commented 2 weeks ago

@jserv How about adding two files in build or test directory to store the versions of buildroot and the Linux kernel? The contents of the files would be as follows:

You can create a file containing the necessary version setting in directory .ci/.

The reason for separating the CI file detection rule is that building buildroot and the Linux kernel takes time (on Github runner takes > 1 hr). Therefore, updates to small ELF executables should not trigger a rebuild of the buildroot and Linux kernel.

Agree. Can you specify the explicit rules to trigger the builds for Linux kernel and/or rootfs?

ChinYikMing commented 2 weeks ago

@jserv How about adding two files in build or test directory to store the versions of buildroot and the Linux kernel? The contents of the files would be as follows:

You can create a file containing the necessary version setting in directory .ci/.

Got it.

The reason for separating the CI file detection rule is that building buildroot and the Linux kernel takes time (on Github runner takes > 1 hr). Therefore, updates to small ELF executables should not trigger a rebuild of the buildroot and Linux kernel.

Agree. Can you specify the explicit rules to trigger the builds for Linux kernel and/or rootfs?

Yes, I will include the CI trigger rules in this PR.

ChinYikMing commented 2 weeks ago

Can you exploit the prebuilt image files used by semu?

Yes, intended. Ultimately, the Image in current build directory will be removed.

Use the released Linux image once it becomes available in rv32emu-prebuilt.

ChinYikMing commented 2 weeks ago

Action items:

  • Send pull request to semu for bumping to Linux v6.6.y, which is the latest longterm kernel. You have to make sure SMP configurations work as well. If not, report on semu. Once semu integrates Linux v6.6.y, rework the above build script here.

Let's stick with the Linux v6.1.y in this PR. Bump to v6.6.y in new PR after this.

ChinYikMing commented 2 weeks ago

Clone the branch:

$ git clone https://github.com/ChinYikMing/rv32emu.git -b feat/bring-up-linux --depth 1

Checkout the repo:

$ cd rv32emu

Fetch prebuilt Linux image and run:

$ make system ENABLE_SYSTEM=1 -j8

To exit VM:

CTRL + a + x

Prebuilt Linux image are available now. Please give it a try. The make check or other CI are broken because the ELF prebuilt tag has not added the suffix "-ELF", shall be confirmed with @vacantron .

jserv commented 2 weeks ago

Prebuilt Linux image are available now. Please give it a try.

I saw repeated messages as following:

[    0.076716] remote fence extension is not available in SBI v0.3

Can you clarify this?

By the way, I attempted to run vi (an applet provided by Busybox), and the emulator crashed.

[    0.318814] Oops [#1]
[    0.318816] Modules linked in:
[    0.318818] CPU: 0 PID: 64 Comm: vi Not tainted 6.1.116 #1
[    0.318822] Hardware name: rv32emu (DT)
[    0.318825] epc : strncpy_from_user+0x6c/0x190
[    0.318829]  ra : getname_flags+0x74/0x194
[    0.318833] epc : c01fb6a8 ra : c00e95a8 sp : c0b07ea0
[    0.318836]  gp : c04da828 tp : c0abb600 t0 : 00000ff0
[    0.318840]  t1 : fefefeff t2 : 6917b420 s0 : c0b07eb0
[    0.318843]  s1 : c0851000 a0 : 00000000 a1 : 00000000
[    0.318847]  a2 : 00000ff0 a3 : 00000000 a4 : 00000000
[    0.318850]  a5 : 00000ff0 a6 : 00000022 a7 : c0851010
[    0.318854]  s2 : c0b07f38 s3 : 00000000 s4 : 00000000
[    0.318857]  s5 : c04db698 s6 : 00000000 s7 : 00000000
[    0.318860]  s8 : 00001000 s9 : 00000002 s10: 00000014
[    0.318864]  s11: ffffffff t3 : 80808080 t4 : 00040000
[    0.318867]  t5 : 00000005 t6 : 00000ff0
[    0.318870] status: 00040120 badaddr: 00000000 cause: 0000000d
[    0.318874] [<c01fb6a8>] strncpy_from_user+0x6c/0x190
[    0.318878] [<c00e95a8>] getname_flags+0x74/0x194
[    0.318883] [<c00e9718>] getname+0x1c/0x2c
[    0.318887] [<c00d7f18>] do_sys_openat2+0x4c/0xf0
[    0.318891] [<c00d80b8>] do_sys_open+0x40/0x58
[    0.318895] [<c00d8130>] sys_openat+0x24/0x34
[    0.318899] [<c0002464>] ret_from_syscall+0x0/0x4
[    0.318903] ---[ end trace 0000000000000000 ]---
[    0.318935] sh[62]: unhandled signal 11 code 0x1 at 0x00000040 in busybox[69016000+b6000]
[    0.318942] CPU: 0 PID: 62 Comm: sh Tainted: G      D            6.1.116 #1
[    0.318947] Hardware name: rv32emu (DT)
[    0.318949] epc : 00000040 ra : 00000040 sp : 9d4df530
[    0.318953]  gp : 690cdd14 tp : 9575d2c0 t0 : 0000000a
[    0.318956]  t1 : 6901d28c t2 : 00000001 s0 : 00000002
[    0.318960]  s1 : ffffffff a0 : fffffff2 a1 : 9d4df520
[    0.318963]  a2 : 9d4df5a0 a3 : 00000006 a4 : 9d4df8e8
[    0.318967]  a5 : 00000011 a6 : 00040000 a7 : 0000005f
[    0.318970]  s2 : 9d4df9dc s3 : 00000000 s4 : 690ce1a0
[    0.318974]  s5 : 00000001 s6 : 690ce1a0 s7 : 690cda60
[    0.318977]  s8 : 0000007f s9 : 00000001 s10: 9d4df9dc
[    0.318980]  s11: 00000004 t3 : 9568afc8 t4 : 00000080
[    0.318984]  t5 : 00000009 t6 : 690b1de8
[    0.318987] status: 00000020 badaddr: 00000040 cause: 0000000c
ChinYikMing commented 2 weeks ago

Prebuilt Linux image are available now. Please give it a try.

I saw repeated messages as following:

[    0.076716] remote fence extension is not available in SBI v0.3

Can you clarify this?

I have used a SMP-enabled Linux configuration to build the Linux kernel, thus the remote fence SBI probing is working to enable flushing cache in different core but there is no corresponding SBI implementation currently. Two ways to suppress this:

  1. Implement a dummy remote fence SBI
  2. disable SMP configuration (this one is easier)

Nonetheless, the remote fence SBI is an essential future feature for accurately simulating SMP behavior. Also, note that the repeated message appears in semu as well.

By the way, I attempted to run vi (an applet provided by Busybox), and the emulator crashed.

[    0.318814] Oops [#1]
[    0.318816] Modules linked in:
[    0.318818] CPU: 0 PID: 64 Comm: vi Not tainted 6.1.116 #1
[    0.318822] Hardware name: rv32emu (DT)
[    0.318825] epc : strncpy_from_user+0x6c/0x190
[    0.318829]  ra : getname_flags+0x74/0x194
[    0.318833] epc : c01fb6a8 ra : c00e95a8 sp : c0b07ea0
[    0.318836]  gp : c04da828 tp : c0abb600 t0 : 00000ff0
[    0.318840]  t1 : fefefeff t2 : 6917b420 s0 : c0b07eb0
[    0.318843]  s1 : c0851000 a0 : 00000000 a1 : 00000000
[    0.318847]  a2 : 00000ff0 a3 : 00000000 a4 : 00000000
[    0.318850]  a5 : 00000ff0 a6 : 00000022 a7 : c0851010
[    0.318854]  s2 : c0b07f38 s3 : 00000000 s4 : 00000000
[    0.318857]  s5 : c04db698 s6 : 00000000 s7 : 00000000
[    0.318860]  s8 : 00001000 s9 : 00000002 s10: 00000014
[    0.318864]  s11: ffffffff t3 : 80808080 t4 : 00040000
[    0.318867]  t5 : 00000005 t6 : 00000ff0
[    0.318870] status: 00040120 badaddr: 00000000 cause: 0000000d
[    0.318874] [<c01fb6a8>] strncpy_from_user+0x6c/0x190
[    0.318878] [<c00e95a8>] getname_flags+0x74/0x194
[    0.318883] [<c00e9718>] getname+0x1c/0x2c
[    0.318887] [<c00d7f18>] do_sys_openat2+0x4c/0xf0
[    0.318891] [<c00d80b8>] do_sys_open+0x40/0x58
[    0.318895] [<c00d8130>] sys_openat+0x24/0x34
[    0.318899] [<c0002464>] ret_from_syscall+0x0/0x4
[    0.318903] ---[ end trace 0000000000000000 ]---
[    0.318935] sh[62]: unhandled signal 11 code 0x1 at 0x00000040 in busybox[69016000+b6000]
[    0.318942] CPU: 0 PID: 62 Comm: sh Tainted: G      D            6.1.116 #1
[    0.318947] Hardware name: rv32emu (DT)
[    0.318949] epc : 00000040 ra : 00000040 sp : 9d4df530
[    0.318953]  gp : 690cdd14 tp : 9575d2c0 t0 : 0000000a
[    0.318956]  t1 : 6901d28c t2 : 00000001 s0 : 00000002
[    0.318960]  s1 : ffffffff a0 : fffffff2 a1 : 9d4df520
[    0.318963]  a2 : 9d4df5a0 a3 : 00000006 a4 : 9d4df8e8
[    0.318967]  a5 : 00000011 a6 : 00040000 a7 : 0000005f
[    0.318970]  s2 : 9d4df9dc s3 : 00000000 s4 : 690ce1a0
[    0.318974]  s5 : 00000001 s6 : 690ce1a0 s7 : 690cda60
[    0.318977]  s8 : 0000007f s9 : 00000001 s10: 9d4df9dc
[    0.318980]  s11: 00000004 t3 : 9568afc8 t4 : 00000080
[    0.318984]  t5 : 00000009 t6 : 690b1de8
[    0.318987] status: 00000020 badaddr: 00000040 cause: 0000000c

I have encountered the same issue. But, when using vi xxx (xxx is some random filename), vi works normally. Try to figure out the root cause.

ChinYikMing commented 2 weeks ago

The rv32emu-prebuilt latest release tag has been added suffix -ELF, so that all CI tests passed.

After merging this PR, the new release of test benches will automatically have the suffix -ELF added.

jserv commented 2 weeks ago

The rv32emu-prebuilt latest release tag has been added suffix -ELF, so that all CI tests passed.

Why uppercase -ELF suffix?

ChinYikMing commented 2 weeks ago

The rv32emu-prebuilt latest release tag has been added suffix -ELF, so that all CI tests passed.

Why uppercase -ELF suffix?

I think it just a typical naming convention when mentioning ELF format, but please let me know if you prefer something different.

Mes0903 commented 2 weeks ago

Hi! I have observed that random occurrences of segmentation faults, kernel panics, and crashes are happening. It feels like approximately one out of every five or six runs results in one of these issues. The tests were conducted on Commit ab8b756.

The command I used is:

make system ENABLE_SYSTEM=1 -j8

For multiple tests afterward, I used:

build/rv32emu -k build/linux-image/Image -i build/linux-image/rootfs.cpio -b build/minimal.dtb

Below is one of the kernel panic cases:

[    0.014183] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[    0.014197] Oops [#1]
[    0.014203] Modules linked in:
[    0.014210] CPU: 0 PID: 1 Comm: swapper Not tainted 6.1.116 #2
[    0.014223] Hardware name: rv32emu (DT)
[    0.014230] epc : __rb_rotate_set_parents+0x0/0x58
[    0.014242]  ra : rb_insert_color+0xc4/0x154
[    0.014254] epc : c0313b54 ra : c031401c sp : c0861cb0
[    0.014265]  gp : c0476320 tp : c0844000 t0 : c09c9f20
[    0.014277]  t1 : 00000000 t2 : d7a9a567 s0 : c0861cc0
[    0.014287]  s1 : c09c9ec8 a0 : c09c9dd0 a1 : c09c9ed8
[    0.014298]  a2 : c09c9d94 a3 : c09c9ed8 a4 : 00000003
[    0.014309]  a5 : 00000000 a6 : 00000016 a7 : c035b560
[    0.014320]  s2 : 00000000 s3 : c0828034 s4 : c09c9d68
[    0.014330]  s5 : c047600c s6 : 00000000 s7 : 00000000
[    0.014341]  s8 : 00000008 s9 : 00000000 s10: 00000000
[    0.014352]  s11: 00000000 t3 : 00000004 t4 : 00000014
[    0.014361]  t5 : ed55a009 t6 : c09b57e6
[    0.014369] status: 00000120 badaddr: 00000008 cause: 0000000d
[    0.014381] [<c0313b54>] __rb_rotate_set_parents+0x0/0x58
[    0.014394] [<c031401c>] rb_insert_color+0xc4/0x154
[    0.014408] [<c010a224>] kernfs_link_sibling+0x54/0xf4
[    0.014421] [<c010b46c>] kernfs_add_one+0x88/0x14c
[    0.014434] [<c010d110>] __kernfs_create_file+0xb4/0xec
[    0.014448] [<c010df08>] sysfs_add_file_mode_ns+0xd4/0x124
[    0.014462] [<c010dfd8>] sysfs_create_file_ns+0x80/0x84
[    0.014475] [<c01f9a3c>] device_create_file+0x8c/0xac
[    0.014490] [<c01fd0dc>] device_add+0x41c/0x67c
[    0.014501] [<c01fd360>] device_register+0x24/0x38
[    0.014514] [<c01d1174>] tty_register_device_attr+0x174/0x210
[    0.014528] [<c01d122c>] tty_register_device+0x1c/0x2c
[    0.014542] [<c01d13a8>] tty_register_driver+0x16c/0x1d0
[    0.014555] [<c033cf64>] pty_init+0x164/0x3d0
[    0.014567] [<c000110c>] do_one_initcall+0x6c/0x260
[    0.014579] [<c032c0ac>] kernel_init_freeable+0x20c/0x210
[    0.014592] [<c0325a6c>] kernel_init+0x24/0x118
[    0.014605] [<c00023d0>] ret_from_exception+0x0/0x1c
[    0.014618] ---[ end trace 0000000000000000 ]---
[    0.014627] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.014640] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Below is another crash example:

[    0.026278] Oops - Oops - load address misaligned [#1]
[    0.026289] Modules linked in:
[    0.026296] CPU: 0 PID: 6 Comm: kworker/u2:0 Not tainted 6.1.116 #2
[    0.026310] Hardware name: rv32emu (DT)
[    0.026318] Workqueue: events_unbound async_run_entry_fn
[    0.026333] epc : jbd2_journal_dirty_metadata+0x28/0x290
[    0.026346]  ra : __ext4_handle_dirty_metadata+0x90/0x204
[    0.026359] epc : c0161438 ra : c0114284 sp : c086dc70
[    0.026370]  gp : c0476320 tp : c0845b80 t0 : c0a1b048
[    0.026382]  t1 : 00000003 t2 : 8147ac9e s0 : c086dca0
[    0.026393]  s1 : c086dd68 a0 : 339dc50d a1 : c086dd68
[    0.026405]  a2 : 339dc50d a3 : 00000000 a4 : 61a20000
[    0.026416]  a5 : c082f0d1 a6 : 7c11977b a7 : 3be9185e
[    0.026427]  s2 : 00000000 s3 : 339dc50d s4 : 00000000
[    0.026438]  s5 : c042be78 s6 : c035c3f8 s7 : 000003a0
[    0.026449]  s8 : 00000001 s9 : 0000000b s10: c089d05f
[    0.026460]  s11: 00000000 t3 : c0880014 t4 : c0c18e84
[    0.026471]  t5 : 2771c19e t6 : c088001c
[    0.026480] status: 00000120 badaddr: 339dc50d cause: 00000004
[    0.026492] [<c0161438>] jbd2_journal_dirty_metadata+0x28/0x290
[    0.026506] [<c0114284>] __ext4_handle_dirty_metadata+0x90/0x204
[    0.026521] [<c012cda4>] ext4_getblk+0x290/0x2a4
[    0.026534] [<c00b9f94>] path_lookupat+0x60/0x154
[    0.026547] [<c00bab08>] filename_lookup+0xa0/0xf8
[    0.026560] [<c00baba0>] kern_path+0x40/0x68
[    0.026572] [<c03392b4>] init_chown+0x3c/0xa8
[    0.026585] [<c032cf10>] do_symlink+0x74/0xac
[    0.026598] [<c032cf88>] write_buffer+0x40/0x64
[    0.026611] [<c032d85c>] unpack_to_rootfs+0x298/0x2e4
[    0.026625] [<c032df54>] do_populate_rootfs+0x6c/0xd4
[    0.026639] [<c002630c>] async_run_entry_fn+0x3c/0xc4
[    0.026654] [<c001d500>] process_one_work+0x188/0x20c
[    0.026667] [<c001da04>] worker_thread+0x20c/0x268
[    0.026680] [<c002341c>] kthread+0xc0/0xc4
[    0.026693] [<c00023d0>] ret_from_exception+0x0/0x1c
[    0.026706] ---[ end trace 0000000000000000 ]---
[    0.033158] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.033467] printk: console [ttyS0] disabled
[    0.033483] f4000000.serial: ttyS0 at MMIO 0xf4000000 (irq = 1, base_baud = 312500) is a 16550
[    0.033504] printk: console [ttyS0] enabled
[    0.033504] printk: console [ttyS0] enabled
[    0.033521] printk: bootconsole [ns16550] disabled
[    0.033521] printk: bootconsole [ns16550] disabled
[    0.033808] clk: Disabling unused clock

Below is another segmentation fault example:

[    0.065393] Freeing unused kernel image (initmem) memory: 160K
[    0.065408] Kernel memory protection not selected by kernel config.
[    0.065422] Run /init as init process
[    0.074177] ln[29]: unhandled signal 11 code 0x1 at 0x9b779c64 in ld-linux-riscv32-ilp32.so.1[95729000+27000]
[    0.074207] CPU: 0 PID: 29 Comm: ln Not tainted 6.1.116 #2
[    0.074222] Hardware name: rv32emu (DT)
[    0.074232] epc : 9573ec38 ra : 9573e030 sp : 9d230dd0
[    0.074246]  gp : 6915cd14 tp : 957782c0 t0 : 0000000a
[    0.074259]  t1 : 9d230df0 t2 : 00000000 s0 : 9d230e50
[    0.074274]  s1 : 95729ab8 a0 : 00009ed6 a1 : 0000009e
[    0.074287]  a2 : 95729ad0 a3 : 0000000a a4 : 9b779c63
[    0.074301]  a5 : 9ed66737 a6 : 2f2f2f2f a7 : 00000001
[    0.074315]  s2 : 9d25bbe0 s3 : 00000001 s4 : 95752008
[    0.074328]  s5 : 95729000 s6 : 9d25bc7c s7 : 95729000
[    0.074342]  s8 : 95751008 s9 : 95729ac4 s10: 957293ac
[    0.074356]  s11: 0000fff1 t3 : 009ed667 t4 : fffffffc
[    0.074370]  t5 : 00000035 t6 : 0000000b
[    0.074381] status: 00000020 badaddr: 9b779c64 cause: 0000000f
Segmentation fault (core dumped)
make: *** [mk/system.mk:27: system] Error 139
Mes0903 commented 1 week ago

I have identified several issues here.

A segmentation fault occurs in the mmu_write_b function, specifically in get_ppn_and_offset, where the value of pte can be 0x0. Since pte is dereferenced in get_ppn_and_offset, this causes a segmentation fault, which is also the reason for the "Unable to handle kernel NULL pointer dereference at virtual address 00000040" message.

Also, the assert(insn) in the block_translate function fails sporadically, resulting in the message "Unable to handle kernel access to user memory without uaccess routines at virtual address."

Additionally, the program randomly enters an unresponsive state. In such cases, it gets stuck in the following code, and the behavior looks like it gets into an infinite loop:

/* BNE: Branch if Not Equal */
RVOP(
    bne,
    { BRANCH_FUNC(uint32_t, ==); },
    GEN({
        rald2, rs1, rs2;
        cmp, VR1, VR0;
        break;
        setjmpoff;
        jcc, 0x85;
        cond, branch_untaken;
        jmp, pc, 4;
        end;
        ldimm, TMP, pc, 4;
        st, S32, TMP, PC;
        exit;
        jmpoff;
        cond, branch_taken;
        jmp, pc, imm;
        end;
        ldimm, TMP, pc, imm;
        st, S32, TMP, PC;
        exit;
    }))

When stuck in this code, the local variables PC and cycle increase in a regular pattern.

Below is the log at the time of the segmentation fault:

[    0.249663] printk: bootconsole [ns16550] disabled
[    0.250717] clk: Disabling unused clocks
[    0.251025] Freeing unused kernel image (initmem) memory: 160K
[    0.251072] Kernel memory protection not selected by kernel config.
[    0.251111] Run /init as init process
[    0.263080] mount[22]: unhandled signal 11 code 0x1 at 0x9b7f6c64 in ld-linux-riscv32-ilp32.so.1[957a6000+27000]
[    0.263175] CPU: 0 PID: 22 Comm: mount Not tainted 6.1.116 #2
[    0.263227] Hardware name: rv32emu (DT)
[    0.263259] epc : 957bbc38 ra : 957bb030 sp : 9d4b3de0
[    0.263306]  gp : 690f1d14 tp : 957282c0 t0 : 0000000a
[    0.263351]  t1 : 9d4b3e00 t2 : 00000000 s0 : 9d4b3e60
[    0.263397]  s1 : 957a6ab8 a0 : 00009ede a1 : 0000009e
[    0.263442]  a2 : 957a6ad0 a3 : 0000000a a4 : 9b7f6c63
[    0.263487]  a5 : 9ede3737 a6 : 2f2f2f2f a7 : 00000001
[    0.263533]  s2 : 9d41bbf0 s3 : 00000001 s4 : 957cf008
[    0.263578]  s5 : 957a6000 s6 : 9d41bc7c s7 : 957a6000
[    0.263624]  s8 : 957ce008 s9 : 957a6ac4 s10: 957a63ac
[    0.263671]  s11: 0000fff1 t3 : 009ede37 t4 : fffffffc
[    0.263717]  t5 : 00000035 t6 : 0000000b
[    0.263752] status: 00000020 badaddr: 9b7f6c64 cause: 0000000f
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1213562==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x555555598401 bp 0x7ffff38af500 sp 0x7fffffffd780 T0)
==1213562==The signal is caused by a READ memory access.
==1213562==Hint: address points to the zero page.
    #0 0x555555598401 in mmu_write_b src/system.c:392
    #1 0x5555555750fe in do_sb src/rv32_template.c:639
    #2 0x5555555628f9 in rv_step src/emulate.c:1075
    #3 0x5555555628f9 in rv_run src/riscv.c:500
    #4 0x5555555628f9 in main src/main.c:279
    #5 0x7ffff722a1c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #6 0x7ffff722a28a in __libc_start_main_impl ../csu/libc-start.c:360
    #7 0x5555555663a4 in _start (/home/mes/MesRepo/Mes-rv32emu/rv32emu/build/rv32emu+0x123a4) (BuildId: e0992c4435c27bffa4166ed19d915866b583f6fc)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV src/system.c:392 in mmu_write_b
==1213562==ABORTING
ChinYikMing commented 1 week ago

@Mes0903 Hi, thanks for your several testing, appreciate that! The get_ppn_and_offset function should work correctly, assuming the PTE is valid at the time it is used ( I might add assertions to ensure the PTE's validity ). However, this assumption does not hold in your test case.

Upon investigation, some page faults are successfully detected and handled by the do_page_fault function in the kernel. Ideally, this trap handler remaps the PTE if it is absent or performs other VMA-related checks. If something goes wrong, a user-space process might receive a SIGSEGV and terminate for example, while a kernel thread could potentially enter a dead state (refer to die_kernel_fault). Tracing do_page_fault in greater detail could help.