paritytech / polkavm

A fast and secure RISC-V based virtual machine
Apache License 2.0
247 stars 55 forks source link

Support upstream toolchain #178

Open koute opened 3 weeks ago

koute commented 3 weeks ago

We're currently using our own fork of rustc to build guest binaries, but in theory it should be possible to now use a purely upstream build of rustc since rv32e support got merged into LLVM.

Copy-paste this to guest-programs/riscv32emac-unknown-none-polkavm.json:

{
  "arch": "riscv32",
  "cpu": "generic-rv32",
  "crt-objects-fallback": "false",
  "data-layout": "e-m:e-p:32:32-i64:64-n32-S32",
  "eh-frame-header": false,
  "emit-debug-gdb-scripts": false,
  "features": "+e,+m,+a,+c,+lui-addi-fusion,+fast-unaligned-access,+xtheadcondmov",
  "linker": "rust-lld",
  "linker-flavor": "ld.lld",
  "llvm-abiname": "ilp32e",
  "llvm-target": "riscv32",
  "max-atomic-width": 32,
  "panic-strategy": "abort",
  "relocation-model": "pie",
  "target-pointer-width": "32",
  "singlethread": true,
  "pre-link-args": {
    "ld": [
      "--emit-relocs",
      "--unique"
    ]
  },
  "env": "polkavm"
}

Then go into guest-programs/build-benchmarks.sh and apply this:

diff --git a/guest-programs/build-benchmarks.sh b/guest-programs/build-benchmarks.sh
index e5435c5..966b28b 100755
--- a/guest-programs/build-benchmarks.sh
+++ b/guest-programs/build-benchmarks.sh
@@ -77,9 +77,9 @@ function build_benchmark() {

     if [ "${RV32E_TOOLCHAIN:-}" != "" ]; then
         echo "> Building: '$1' (polkavm)"
-        RUSTFLAGS="-C target-feature=+lui-addi-fusion,+fast-unaligned-access,+xtheadcondmov,+c -C relocation-model=pie -C link-arg=--emit-relocs -C link-arg=--unique $extra_flags" rustup run $RV32E_TOOLCHAIN cargo build -q --release --bin $1 -p $1
+        RUSTFLAGS="$extra_flags" rustup run nightly-2024-07-10 cargo build -Z build-std=core,alloc --target "$current_dir/riscv32emac-unknown-none-polkavm.json" -q --release --bin $1 -p $1
         cd ..
-        cargo run -q -p polkatool link --run-only-if-newer guest-programs/target/riscv32ema-unknown-none-elf/release/$1 -o guest-programs/target/riscv32ema-unknown-none-elf/release/$1.polkavm
+        cargo run -q -p polkatool link --run-only-if-newer guest-programs/target/riscv32emac-unknown-none-polkavm/release/$1 -o guest-programs/target/riscv32emac-unknown-none-polkavm/release/$1.polkavm
         cd $current_dir
     fi

Then run build-benchmarks.sh (you'll have to install the nightly-2024-07-10 toolchain with rustup). We expect the polkatool link call to work. Unfortunately you will get an error:

ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-minimal": found an export without a relocation for a pointer to the metadata at <section #10+1>

The relevant code is here:

https://github.com/koute/polkavm/blob/v0.11.0/crates/polkavm-derive-impl/src/export.rs#L201

https://github.com/koute/polkavm/blob/v0.11.0/crates/polkavm-linker/src/program_from_elf.rs#L1147

Basically, for every exported function/symbol we emit an entry in the .polkavm_exports section, and each entry has two pointers: one pointer to the function we want to export, and another pointer to the metadata (which is a static variable). Since we compile the code as position independent we expect the compiler to emit a relocation for both of these pointers (our linker works exclusively on (section_number, offset) pairs instead of raw addresses). For some reason the compiler doesn't emit them in this case.

Check whether the compiler really doesn't emit those relocations or whether we just have a bug. If it doesn't emit them figure out why, and make it emit them. The end result should be that the polkatool link can correctly link the example program.

jarkkojs commented 3 weeks ago

With the changes mentioned above, relocations are as follows:

$ rz-bin -R guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-minimal
ERROR: Cannot determine entrypoint, using 0x00011142.
[Relocs]
vaddr      paddr      type   name                                                      
---------------------------------------------------------------------------------------
0x0001115a 0x0000015a ADD_64 bench_minimal::STATE::hcb6048495de15bc7 + 0x000131bc
0x0001115e 0x0000015e ADD_64 .Lpcrel_hi2 + 0x000131bc
0x0001116e 0x0000016e ADD_64 bench_minimal::TABLE::h588d4e21d7963555 + 0x000131bc
0x00011172 0x00000172 ADD_64 .Lpcrel_hi3 + 0x000131bc
0x00012184 0x00000184 ADD_64 bench_minimal::dummy_even::haf8b4b7c11edfc65 + 0x000131bc
0x00012188 0x00000188 ADD_64 bench_minimal::dummy_odd::he450f3d8dd78b56a + 0x000131bc
0x00013195 0x00000195 ADD_64 .Lanon.7a17a70c2e207411a1aa2f240afa160c.0 + 0x000131bc
0x000131a4 0x000001a4 ADD_64 .Lanon.7a17a70c2e207411a1aa2f240afa160c.1 + 0x000131bc
0x000131ac 0x000001ac ADD_64 bench_minimal::dummy_even::haf8b4b7c11edfc65 + 0x000131bc

I cannot compare because without changes:

$ guest-programs/build-benchmarks.sh
> Building: 'bench-minimal' (polkavm)
/home/jarkko/.rustup/toolchains/riscv32em-nightly-2024-01-05-r0-x86_64-unknown-linux-gnu/bin/cargo: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory

Let's try:

1. `distrobox create -i ubuntu:20.04 && distrobox enter ubuntu-20-04`
2. `sudo apt install git libssl-dev`
3. `guest-programs/build-benchmarks.sh`

That results in HUGE amount of relocations (no worth pasting here): https://gist.github.com/jarkkojs/c767e09c115f5086908607e5c4ff8cb3

Comparison shows that the binary compiled with proposed changes has only exports with a symbol associated. If I interpret this correctly, none of the PolkaVM exports emit an associated relocation entry.

jarkkojs commented 3 weeks ago

Got further by:

diff --git a/guest-programs/riscv32emac-unknown-none-polkavm.json b/guest-programs/riscv32emac-unknown-none-polkavm.json
index 519bc67..bbd54cd 100644
--- a/guest-programs/riscv32emac-unknown-none-polkavm.json
+++ b/guest-programs/riscv32emac-unknown-none-polkavm.json
@@ -18,7 +18,8 @@
   "pre-link-args": {
     "ld": [
       "--emit-relocs",
-      "--unique"
+      "--unique",
+      "--relocatable"
     ]
   },
   "env": "polkavm"

Now:

ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-pinky": failed to process DWARF: found overlapping inline subroutines in a parent inline subroutine
~

So these are bit hard to grep (if e.g. 2>&1 | tee log.txt) and also could not see where it failed so I did polkavm-linker: report inline fields on overlap, and now I get:

ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-pinky": DWARF: overlap at depth=3, fn=Some("cap") and decl_location=Some(/home/jarkko/.cargo/git/checkouts/pinky-deb3f017ba6411b6/5a9d4ff/nes/src/rp2c02.rs:1311)
athei commented 3 weeks ago

Comparison shows that the binary compiled with proposed changes has only exports with a symbol associated.

In comparison to the elf built with our custom toolchain?

jarkkojs commented 3 weeks ago

Comparison shows that the binary compiled with proposed changes has only exports with a symbol associated.

In comparison to the elf built with our custom toolchain?

  1. The dump in my comment: 1:1 changes in the description applied.
  2. The dump in gists: master as it is, which translates to the custom toolchain.

Without --relocatable only functions get relocation entries.

jarkkojs commented 3 weeks ago

Got further by:

diff --git a/guest-programs/riscv32emac-unknown-none-polkavm.json b/guest-programs/riscv32emac-unknown-none-polkavm.json
index 519bc67..bbd54cd 100644
--- a/guest-programs/riscv32emac-unknown-none-polkavm.json
+++ b/guest-programs/riscv32emac-unknown-none-polkavm.json
@@ -18,7 +18,8 @@
   "pre-link-args": {
     "ld": [
       "--emit-relocs",
-      "--unique"
+      "--unique",
+      "--relocatable"
     ]
   },
   "env": "polkavm"

Now:

ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-pinky": failed to process DWARF: found overlapping inline subroutines in a parent inline subroutine
~

So these are bit hard to grep (if e.g. 2>&1 | tee log.txt) and also could not see where it failed so I did polkavm-linker: report inline fields on overlap, and now I get:

ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-pinky": DWARF: overlap at depth=3, fn=Some("cap") and decl_location=Some(/home/jarkko/.cargo/git/checkouts/pinky-deb3f017ba6411b6/5a9d4ff/nes/src/rp2c02.rs:1311)

I could verify that DWARF is correct:

$ rz-bin -d guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-pinky
[Dwarf]
    1      DW_TAG_compile_unit       [has children] (0x0)
    DW_AT_producer                 DW_FORM_strp
    DW_AT_language                 DW_FORM_data2
    DW_AT_name                     DW_FORM_strp
    DW_AT_stmt_list                DW_FORM_sec_offset
    DW_AT_comp_dir                 DW_FORM_strp
    DW_AT_low_pc                   DW_FORM_addr
    DW_AT_ranges                   DW_FORM_sec_offset
    2      DW_TAG_variable           [no children] (0x13)

Also other tools that process DWARF have no issues parsing.

From this we can deduce that there is a bug in dwarf.rs with high probability.

athei commented 3 weeks ago

From this we can deduce that there is a bug in dwarf.rs with high probability.

Seems plausible to me. I think our custom toolchain uses a much older version of rustc so it is not unreasonable to expect that stuff changed which broke our parsing.

jarkkojs commented 2 weeks ago

From this we can deduce that there is a bug in dwarf.rs with high probability.

Seems plausible to me. I think our custom toolchain uses a much older version of rustc so it is not unreasonable to expect that stuff changed which broke our parsing.

Yeah when I synced with Jan he also said that it is entirely possible. And I'd expect other tools to fail too if it was the DWARF data.

jarkkojs commented 1 week ago

Adding some information about the following failure with benchmark-sieve:

[DEBUG polkavm_linker::dwarf] 0004920c -------------->DW_TAG_inlined_subroutine (DebugInfoOffset(299532))
ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-prime-sieve": failed to process DWAR
F: found a unit for offset=0 but couldn't compute a relative offset                                    

Walk-entry trace was temporarily turned into log::debug!(), other tests were commented out and command RUST_LOG="polkavm_linker::dwarf=debug" guest-programs/build-benchmarks.sh was run.

Failing entry:

<0x491fc>: Abbrev Number: 51   (DW_TAG_inlined_subroutine)
        DW_AT_abstract_origin [DW_FORM_ref4]    : <0x4cab0>
        DW_AT_low_pc [DW_FORM_addr]     : 0x0
        DW_AT_high_pc [DW_FORM_data4]   : 0
        DW_AT_call_file [DW_FORM_data1] : 6
        DW_AT_call_line [DW_FORM_data1] : 232
        DW_AT_call_column [DW_FORM_data1]       : 38

Abstract origin:

<0x4cab0>: Abbrev Number: 36   (DW_TAG_subprogram)
        DW_AT_linkage_name [DW_FORM_strp]       : (indirect string, .debug_str+0x0): PRIME_VALIDATOR_DATA
        DW_AT_name [DW_FORM_strp]       : (indirect string, .debug_str+0x0): PRIME_VALIDATOR_DATA
        DW_AT_decl_file [DW_FORM_data1] : 27
        DW_AT_decl_line [DW_FORM_data1] : 247
        DW_AT_type [DW_FORM_ref4]       : <0xe9d>
        DW_AT_inline [DW_FORM_data1]    : 1

@koute:

  1. How/where can I find the file matching index 27?
  2. Sanity checking: DW_TAG_subprogram refers to the definition of the program and DW_TAG_inlined_subroutine refers to the inlined instance of that definition, i.e. the location where the subprogram is open-coded, if I interpreted DWARF4 specification correctly.
jarkkojs commented 1 week ago

I.e. topology of inlining goes something like this according to the spec:

  1. DW_TAG_subprogram: contains data to connect tag to the source file locations of a function implemention.
  2. DAW_TAG_inlined_subroutine: open coded (inlined) subprogram. Probably can be also for inlined subprogram (by optimizer or something) without DW_AT_inline set to 1. Contains offset in debug_info in the attribute DW_AT_abstract_origin referring to the subprogram that is inlined.
  3. Out-of-line inlined subroutines have entry points i.e. are like clones of functions. They are tagged as DW_TAG_subprogram but have DW_AT_abstract_origin containing the reference to the root subprogram.
jarkkojs commented 1 week ago

Looking at DW_AT_declfile and DW_AT_line and using data file name and directory tables the inlined functions is at /home/jarkko/.rustup/toolchains/nightly-2024-07-10-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:247:

    #[inline]
    fn allocate_zeroed(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
        self.alloc_impl(layout, true)
    }

Checked that target_unit is actually rustlib and paths contains also alloc.rs.

jarkkojs commented 1 week ago

According to test_debug_info_offset() unit test in read/unit.rs of gimli crate:

assert_eq!(DebugInfoOffset(0).to_unit_offset(&unit), None);

Based on those I made following test change:

diff --git a/crates/polkavm-linker/src/dwarf.rs b/crates/polkavm-linker/src/dwarf.rs
index 26e645f..22ad9dd 100644
--- a/crates/polkavm-linker/src/dwarf.rs
+++ b/crates/polkavm-linker/src/dwarf.rs
@@ -27,7 +27,12 @@ where
         }
         Err(index) => &units[index - 1],
     };
-    let unit_offset = target_offset.to_unit_offset(&target_unit.raw_unit.header).ok_or_else(|| {
+
+    let unit_header = target_unit.raw_unit.header.clone();
+    log::info!("target offset: {:?}", target_offset);
+    log::info!("unit header offset: {:?}", unit_header.offset());
+
+    let unit_offset = target_offset.to_unit_offset(&unit_header).ok_or_else(|| {
         ProgramFromElfError::other(format!(
             "failed to process DWARF: found a unit for offset={:x} but couldn't compute a relative offset",
             target_offset.0.into_u64()
@@ -1366,7 +1371,7 @@ where
             return Ok(Default::default());
         };

-        log::trace!(
+        log::debug!(
             "{:08x} {:->depth$}{name} ({node_offset:?})",
             node_entry.offset().0.into_u64(),
             ">",

This gives me the following output:

...
[INFO  polkavm_linker::dwarf] target offset: DebugInfoOffset(314032)
[INFO  polkavm_linker::dwarf] unit header offset: DebugInfoOffset(DebugInfoOffset(0))
[INFO  polkavm_linker::dwarf] target offset: DebugInfoOffset(315175)
[INFO  polkavm_linker::dwarf] unit header offset: DebugInfoOffset(DebugInfoOffset(0))
[INFO  polkavm_linker::dwarf] target offset: DebugInfoOffset(0)
[INFO  polkavm_linker::dwarf] unit header offset: DebugInfoOffset(DebugInfoOffset(0))

So for some reason resolve_abstract_origin gives offset zero as the abstract origin, which does not match what I saw in dump (0x4cab0). So for some reason for the above subprogram the tag DW_TAG_subprogram is not correctly parsed given that the entry can be found with other tools.

jarkkojs commented 1 week ago

Ya, so not much commentary needed up until I have final PR ready. There's just dozen of minor differences in DWARF data generated between Parity's toolchain and official rustc that we hit, which can be also witnessed by observing the data with dwarfdump, readelf and rz-bin.