Open koute opened 3 weeks ago
With the changes mentioned above, relocations are as follows:
$ rz-bin -R guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-minimal
ERROR: Cannot determine entrypoint, using 0x00011142.
[Relocs]
vaddr paddr type name
---------------------------------------------------------------------------------------
0x0001115a 0x0000015a ADD_64 bench_minimal::STATE::hcb6048495de15bc7 + 0x000131bc
0x0001115e 0x0000015e ADD_64 .Lpcrel_hi2 + 0x000131bc
0x0001116e 0x0000016e ADD_64 bench_minimal::TABLE::h588d4e21d7963555 + 0x000131bc
0x00011172 0x00000172 ADD_64 .Lpcrel_hi3 + 0x000131bc
0x00012184 0x00000184 ADD_64 bench_minimal::dummy_even::haf8b4b7c11edfc65 + 0x000131bc
0x00012188 0x00000188 ADD_64 bench_minimal::dummy_odd::he450f3d8dd78b56a + 0x000131bc
0x00013195 0x00000195 ADD_64 .Lanon.7a17a70c2e207411a1aa2f240afa160c.0 + 0x000131bc
0x000131a4 0x000001a4 ADD_64 .Lanon.7a17a70c2e207411a1aa2f240afa160c.1 + 0x000131bc
0x000131ac 0x000001ac ADD_64 bench_minimal::dummy_even::haf8b4b7c11edfc65 + 0x000131bc
I cannot compare because without changes:
$ guest-programs/build-benchmarks.sh
> Building: 'bench-minimal' (polkavm)
/home/jarkko/.rustup/toolchains/riscv32em-nightly-2024-01-05-r0-x86_64-unknown-linux-gnu/bin/cargo: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory
Let's try:
1. `distrobox create -i ubuntu:20.04 && distrobox enter ubuntu-20-04`
2. `sudo apt install git libssl-dev`
3. `guest-programs/build-benchmarks.sh`
That results in HUGE amount of relocations (no worth pasting here): https://gist.github.com/jarkkojs/c767e09c115f5086908607e5c4ff8cb3
Comparison shows that the binary compiled with proposed changes has only exports with a symbol associated. If I interpret this correctly, none of the PolkaVM exports emit an associated relocation entry.
Got further by:
diff --git a/guest-programs/riscv32emac-unknown-none-polkavm.json b/guest-programs/riscv32emac-unknown-none-polkavm.json
index 519bc67..bbd54cd 100644
--- a/guest-programs/riscv32emac-unknown-none-polkavm.json
+++ b/guest-programs/riscv32emac-unknown-none-polkavm.json
@@ -18,7 +18,8 @@
"pre-link-args": {
"ld": [
"--emit-relocs",
- "--unique"
+ "--unique",
+ "--relocatable"
]
},
"env": "polkavm"
Now:
ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-pinky": failed to process DWARF: found overlapping inline subroutines in a parent inline subroutine
~
So these are bit hard to grep (if e.g. 2>&1 | tee log.txt
) and also could not see where it failed so I did polkavm-linker: report inline fields on overlap
, and now I get:
ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-pinky": DWARF: overlap at depth=3, fn=Some("cap") and decl_location=Some(/home/jarkko/.cargo/git/checkouts/pinky-deb3f017ba6411b6/5a9d4ff/nes/src/rp2c02.rs:1311)
Comparison shows that the binary compiled with proposed changes has only exports with a symbol associated.
In comparison to the elf built with our custom toolchain?
Comparison shows that the binary compiled with proposed changes has only exports with a symbol associated.
In comparison to the elf built with our custom toolchain?
Without --relocatable
only functions get relocation entries.
Got further by:
diff --git a/guest-programs/riscv32emac-unknown-none-polkavm.json b/guest-programs/riscv32emac-unknown-none-polkavm.json index 519bc67..bbd54cd 100644 --- a/guest-programs/riscv32emac-unknown-none-polkavm.json +++ b/guest-programs/riscv32emac-unknown-none-polkavm.json @@ -18,7 +18,8 @@ "pre-link-args": { "ld": [ "--emit-relocs", - "--unique" + "--unique", + "--relocatable" ] }, "env": "polkavm"
Now:
ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-pinky": failed to process DWARF: found overlapping inline subroutines in a parent inline subroutine ~
So these are bit hard to grep (if e.g.
2>&1 | tee log.txt
) and also could not see where it failed so I didpolkavm-linker: report inline fields on overlap
, and now I get:ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-pinky": DWARF: overlap at depth=3, fn=Some("cap") and decl_location=Some(/home/jarkko/.cargo/git/checkouts/pinky-deb3f017ba6411b6/5a9d4ff/nes/src/rp2c02.rs:1311)
I could verify that DWARF is correct:
$ rz-bin -d guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-pinky
[Dwarf]
1 DW_TAG_compile_unit [has children] (0x0)
DW_AT_producer DW_FORM_strp
DW_AT_language DW_FORM_data2
DW_AT_name DW_FORM_strp
DW_AT_stmt_list DW_FORM_sec_offset
DW_AT_comp_dir DW_FORM_strp
DW_AT_low_pc DW_FORM_addr
DW_AT_ranges DW_FORM_sec_offset
2 DW_TAG_variable [no children] (0x13)
Also other tools that process DWARF have no issues parsing.
From this we can deduce that there is a bug in dwarf.rs
with high probability.
From this we can deduce that there is a bug in dwarf.rs with high probability.
Seems plausible to me. I think our custom toolchain uses a much older version of rustc so it is not unreasonable to expect that stuff changed which broke our parsing.
From this we can deduce that there is a bug in dwarf.rs with high probability.
Seems plausible to me. I think our custom toolchain uses a much older version of rustc so it is not unreasonable to expect that stuff changed which broke our parsing.
Yeah when I synced with Jan he also said that it is entirely possible. And I'd expect other tools to fail too if it was the DWARF data.
Adding some information about the following failure with benchmark-sieve:
[DEBUG polkavm_linker::dwarf] 0004920c -------------->DW_TAG_inlined_subroutine (DebugInfoOffset(299532))
ERROR: failed to link "guest-programs/target/riscv32emac-unknown-none-polkavm/release/bench-prime-sieve": failed to process DWAR
F: found a unit for offset=0 but couldn't compute a relative offset
Walk-entry trace was temporarily turned into log::debug!()
, other tests were commented out and command RUST_LOG="polkavm_linker::dwarf=debug" guest-programs/build-benchmarks.sh
was run.
Failing entry:
<0x491fc>: Abbrev Number: 51 (DW_TAG_inlined_subroutine)
DW_AT_abstract_origin [DW_FORM_ref4] : <0x4cab0>
DW_AT_low_pc [DW_FORM_addr] : 0x0
DW_AT_high_pc [DW_FORM_data4] : 0
DW_AT_call_file [DW_FORM_data1] : 6
DW_AT_call_line [DW_FORM_data1] : 232
DW_AT_call_column [DW_FORM_data1] : 38
Abstract origin:
<0x4cab0>: Abbrev Number: 36 (DW_TAG_subprogram)
DW_AT_linkage_name [DW_FORM_strp] : (indirect string, .debug_str+0x0): PRIME_VALIDATOR_DATA
DW_AT_name [DW_FORM_strp] : (indirect string, .debug_str+0x0): PRIME_VALIDATOR_DATA
DW_AT_decl_file [DW_FORM_data1] : 27
DW_AT_decl_line [DW_FORM_data1] : 247
DW_AT_type [DW_FORM_ref4] : <0xe9d>
DW_AT_inline [DW_FORM_data1] : 1
@koute:
DW_TAG_subprogram
refers to the definition of the program and DW_TAG_inlined_subroutine
refers to the inlined instance of that definition, i.e. the location where the subprogram is open-coded, if I interpreted DWARF4 specification correctly.I.e. topology of inlining goes something like this according to the spec:
Looking at DW_AT_declfile and DW_AT_line and using data file name and directory tables the inlined functions is at /home/jarkko/.rustup/toolchains/nightly-2024-07-10-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:247
:
#[inline]
fn allocate_zeroed(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
self.alloc_impl(layout, true)
}
Checked that target_unit is actually rustlib and paths contains also alloc.rs
.
According to test_debug_info_offset()
unit test in read/unit.rs
of gimli
crate:
assert_eq!(DebugInfoOffset(0).to_unit_offset(&unit), None);
Based on those I made following test change:
diff --git a/crates/polkavm-linker/src/dwarf.rs b/crates/polkavm-linker/src/dwarf.rs
index 26e645f..22ad9dd 100644
--- a/crates/polkavm-linker/src/dwarf.rs
+++ b/crates/polkavm-linker/src/dwarf.rs
@@ -27,7 +27,12 @@ where
}
Err(index) => &units[index - 1],
};
- let unit_offset = target_offset.to_unit_offset(&target_unit.raw_unit.header).ok_or_else(|| {
+
+ let unit_header = target_unit.raw_unit.header.clone();
+ log::info!("target offset: {:?}", target_offset);
+ log::info!("unit header offset: {:?}", unit_header.offset());
+
+ let unit_offset = target_offset.to_unit_offset(&unit_header).ok_or_else(|| {
ProgramFromElfError::other(format!(
"failed to process DWARF: found a unit for offset={:x} but couldn't compute a relative offset",
target_offset.0.into_u64()
@@ -1366,7 +1371,7 @@ where
return Ok(Default::default());
};
- log::trace!(
+ log::debug!(
"{:08x} {:->depth$}{name} ({node_offset:?})",
node_entry.offset().0.into_u64(),
">",
This gives me the following output:
...
[INFO polkavm_linker::dwarf] target offset: DebugInfoOffset(314032)
[INFO polkavm_linker::dwarf] unit header offset: DebugInfoOffset(DebugInfoOffset(0))
[INFO polkavm_linker::dwarf] target offset: DebugInfoOffset(315175)
[INFO polkavm_linker::dwarf] unit header offset: DebugInfoOffset(DebugInfoOffset(0))
[INFO polkavm_linker::dwarf] target offset: DebugInfoOffset(0)
[INFO polkavm_linker::dwarf] unit header offset: DebugInfoOffset(DebugInfoOffset(0))
So for some reason resolve_abstract_origin
gives offset zero as the abstract origin, which does not match what I saw in dump (0x4cab0). So for some reason for the above subprogram the tag DW_TAG_subprogram is not correctly parsed given that the entry can be found with other tools.
Ya, so not much commentary needed up until I have final PR ready. There's just dozen of minor differences in DWARF data generated between Parity's toolchain and official rustc that we hit, which can be also witnessed by observing the data with dwarfdump, readelf and rz-bin.
We're currently using our own fork of rustc to build guest binaries, but in theory it should be possible to now use a purely upstream build of rustc since rv32e support got merged into LLVM.
Copy-paste this to
guest-programs/riscv32emac-unknown-none-polkavm.json
:Then go into
guest-programs/build-benchmarks.sh
and apply this:Then run
build-benchmarks.sh
(you'll have to install thenightly-2024-07-10
toolchain with rustup). We expect thepolkatool link
call to work. Unfortunately you will get an error:The relevant code is here:
https://github.com/koute/polkavm/blob/v0.11.0/crates/polkavm-derive-impl/src/export.rs#L201
https://github.com/koute/polkavm/blob/v0.11.0/crates/polkavm-linker/src/program_from_elf.rs#L1147
Basically, for every exported function/symbol we emit an entry in the
.polkavm_exports
section, and each entry has two pointers: one pointer to the function we want to export, and another pointer to the metadata (which is a static variable). Since we compile the code as position independent we expect the compiler to emit a relocation for both of these pointers (our linker works exclusively on(section_number, offset)
pairs instead of raw addresses). For some reason the compiler doesn't emit them in this case.Check whether the compiler really doesn't emit those relocations or whether we just have a bug. If it doesn't emit them figure out why, and make it emit them. The end result should be that the
polkatool link
can correctly link the example program.