Open koutheir opened 2 years ago
Sounds good to me, hopefully it’s straightforward fix? Good to get this stuff right now. Are there any other fields that would have similar semantics?
hopefully it’s straightforward fix?
The fix for jmprel
is not complicated, but it is a breaking change.
Are there any other fields that would have similar semantics?
Yes, there are a lot, as shown in Figure 2-7. Dynamic Array Tags, d_tag
in the Tool Interface Standard (TIS), Executable and Linking Format (ELF) Specification, Version 1.2 and Dynamic Linking.
The following tags are optional in an executable:
DT_BIND_NOW
DT_DEBUG
DT_FINI
DT_INIT
DT_JMPREL
DT_NEEDED
DT_PLTGOT
DT_PLTREL
DT_PLTRELSZ
DT_RPATH
DT_TEXTREL
The following tags are optional in a shared object:
DT_BIND_NOW
DT_FINI
DT_INIT
DT_JMPREL
DT_NEEDED
DT_PLTGOT
DT_PLTREL
DT_PLTRELSZ
DT_REL
DT_RELA
DT_RELAENT
DT_RELASZ
DT_RELENT
DT_RELSZ
DT_SONAME
DT_SYMBOLIC
DT_TEXTREL
The following tags may be optional (they are unspecified):
DT_LOPROC
to DT_HIPROC
(0x70000000
-0x7fffffff
).In addition to the optional vs. mandatory nature, there are dependencies between some tags:
DT_NULL
.DT_JMPREL
is present, then DT_PLTREL
and DT_PLTRELSZ
become mandatory.DT_RELA
is present, then DT_RELASZ
and DT_RELAENT
become mandatory.DT_REL
is present, then DT_RELSZ
and DT_RELENT
become mandatory.Therefore, the following fields in DynamicInfo
require changes:
rela
, relasz
, relaent
, relacount
, rel
, relsz
, relent
, relcount
, pltgot
, pltrelsz
, pltrel
, jmprel
, init
, fini
, soname
, textrel
: they are all optional, with interdependencies.hash
: this is mandatory and must not be optional.I may have missed some other tags, and I came into the conclusion that:
I have a question, though: why are we transforming the values of some tags from virtual addresses into file offsets?
@m4b, the following is an idea about the new DynamicInfo
(here called DynamicTags64
), which now has zero public fields. Please review it before I attempt to implement it.
impl DynamicTags64 {
pub fn parse() -> Self {}
pub fn tags_iter(&self) -> impl Iterator<Item = (NonZeroU64, u64)> {}
// DT_NEEDED
pub fn needed_libraries_iter(&self) -> impl Iterator<Item = &CStr> {}
// DT_JMPREL, DT_PLTRELSZ, DT_PLTREL
pub fn procedure_linkage_table_relocation_info(&self) -> Option<ProcedureLinkageTableRelocations64> {}
// DT_PLTGOT
pub fn global_offset_table_address(&self) -> Option<u64> {}
// DT_HASH
pub fn symbol_hash_table_address(&self) -> u64 {}
// DT_STRTAB, DT_STRSZ
pub fn string_table_info(&self) -> StringTable64 {}
// DT_SYMTAB, DT_SYMENT
pub fn symbol_table_info(&self) -> SymbolTable64 {}
// DT_RELA, DT_RELASZ, DT_RELAENT
pub fn explicit_relocation_table_info(&self) -> Option<RelocationTable64> {}
// DT_REL, DT_RELSZ, DT_RELENT
pub fn implicit_relocation_table_info(&self) -> Option<RelocationTable64> {}
// DT_INIT
pub fn initialization_function_address(&self) -> Option<u64> {}
// DT_FINI
pub fn termination_function_address(&self) -> Option<u64> {}
// DT_SONAME
pub fn shared_object_name(&self) -> Option<&CStr> {}
// DT_RPATH, DT_RUNPATH
pub fn library_search_path_info(&self) -> LibrarySearchPaths {}
// DT_SYMBOLIC
pub fn local_symbol_resolution_starts_from_shared_object(&self) -> bool {}
// DT_DEBUG
pub fn debug_value(&self) -> Option<u64> {}
// DT_TEXTREL
pub fn relocations_can_modify_read_only_segments(&self) -> bool {}
// DT_BIND_NOW
pub fn process_all_relocations_early(&self) -> bool {}
// DT_INIT_ARRAY, DT_INIT_ARRAYSZ
pub fn initialization_functions_iter(&self) -> impl Iterator<Item = u64> {}
// DT_FINI_ARRAY, DT_FINI_ARRAYSZ
pub fn termination_functions_iter(&self) -> impl Iterator<Item = u64> {}
// DT_INIT
pub fn flags(&self) -> Option<u64> {}
// DT_PREINIT_ARRAY, DT_PREINIT_ARRAYSZ
pub fn pre_initialization_functions_iter(&self) -> impl Iterator<Item = u64> {}
}
where:
pub struct ProcedureLinkageTableRelocations64 {
pub address: u64,
pub total_size: u64,
pub relocations_type: u64,
}
pub struct StringTable64 {
pub address: u64,
pub total_size: u64,
}
pub struct SymbolTable64 {
pub address: u64,
pub entry_size: u64,
}
pub struct RelocationTable64 {
pub address: u64,
pub total_size: u64,
pub entry_size: u64,
}
pub struct LibrarySearchPaths<'elf> {
pub rpath: DynamicTagsLibrarySearchPathIter<'elf>,
pub runpath: DynamicTagsLibrarySearchPathIter<'elf>,
}
struct DynamicTagsIter64<'elf> {}
impl<'elf> Iterator for DynamicTagsIter64<'elf> {
type Item = (NonZeroU64, u64);
fn next(&mut self) -> Option<Self::Item> {}
}
struct DynamicTagsNeededIter64<'elf> {}
impl<'elf> Iterator for DynamicTagsNeededIter64<'elf> {
type Item = &'elf CStr;
fn next(&mut self) -> Option<Self::Item> {}
}
pub struct DynamicTagsLibrarySearchPathIter<'elf> {}
impl<'elf> Iterator for DynamicTagsLibrarySearchPathIter<'elf> {
type Item = &'elf CStr;
fn next(&mut self) -> Option<Self::Item> {}
}
hash
: this is mandatory and must not be optional.
While this is correct according to the sysv ABI, recent version of gnu ld have stopped emitting DT_HASH
in favor of only including DT_GNU_HASH
. hash
should be considered optional in Goblin, or else it will be impossible to parse files built with modern toolchains.
I'm unsure of what "recent" means here, but at the very least programs built on Ubuntu 22.04 do not include a DT_HASH
entry
@koutheir thanks for taking time to write this up and the api proposal.
So this would be a moderate breaking change, and tentatively i'm ok with it.
Some initial thoughts after reading your proposal:
textrel -> relocations_can_modify_read_only_segments
a. while this is a quite good description actually of what the tag means, i'm worried there is a discrepancy between other parts of goblin/elf which tend to skew towards faithfully using field names/type names as close as possible to the elf headers/etc., and this new proposal which makes it more "rusty" + more descriptive names. This could be strange for users expecting to look for something like textrel and not knowing there is a function name that describes what textrel is, if that makes any sense?Anyway, those are my initial concerns, mostly:
I generally don't like "Java bean" style getters...
Can you explain what you mean?
while this is a quite good description actually of what the tag means, i'm worried there is a discrepancy between other parts of goblin/elf which tend to skew towards faithfully using field names/type names as close as possible to the elf headers/etc.,
I'm not sure I understand the approach taken in goblin. I effectively observed this reuse of ELF abbreviations in the goblin interface, reflecting the ELF structures underneath, but then I also see issues such as #345 where assumptions are made just to give the illusion that strings are UTF-8 character sequences inside the ELF string table. I think we need to stick to a specific level of abstraction and remain consistent. Please let me know which level of abstraction I should assume.
This could be strange for users expecting to look for something like textrel and not knowing there is a function name that describes what textrel is...
The approach I take in my crates, in order to fix this issue, is to use the #[doc(alias = "...")]
attribute on functions, in order to:
For example, in my selinux
crate, you can find the function SecurityContext.current()
by searching for the term current
(which is what describes the feature in general) or by looking for the underlying libselinux
function name getcon
(when that information is known beforehand). This allows the crate API to use meaningful names, while allowing accurate and fast finding of information through searching.
returning $size instead of usize probably also makes sense...
What is $size is this context?
users should decide to cast if they want or not.
I agree, but then again, which level of abstraction am I to assume. Should I give the actual information, even if it means handing out u64
from ELF64 binaries, or should I assume (for the user) that sizes and offsets and the like won't exceed usize
and truncate values?
re your question why some are turned into vm offsets, i think this was primarily for use in dryad at the time, which used those values explicitly, or it was used elsewhere for indexing (many of those fields are indexes so usize was more convenient)
This is problematic, because there is nothing in the documentation or API that indicates that these values are file offsets, instead of (what ELF documents) virtual addresses. I don't expect people to read the crate's source code to figure out why they are getting the wrong number in these fields.
goblin tries to minimize breaking changes at this stage (even though not 1.0, which I've wanted for a while, really I should just fix up some minor things and slap on 1.0 and that's it)
I'm very grateful for goblin, and the effort that went into it, and allowing the community to enjoy it. However, I feel the API still needs breaking changes.
the api looks good, though it seems like it doesn't fit well with the rest of structs goblin provides
I agree, and I value consistency. Because of that, I think we need to either (1) adapt the interface to make it more like the rest of the crate, or (2) adapt the crate and make it more like this interface. I vote for (2).
In ELF, the
DT_JMPREL
dynamic array tag is optional, and can be present (holding an address value) or absent. The value of that tag is exposed asgoblin::elf::dynamic::DynamicInfo.jmprel
, which is of typeusize
.DynamicInfo.jmprel
is set to zero ifDT_JMPREL
is absent from the ELF file.Therefore, based on the definition of
goblin::elf::dynamic::DynamicInfo
, there is no way to differentiate the absence ofDT_JMPREL
from its presence with the value zero, because zero can be a valid address, especially in privileged mode and in embedded systems.I think
DT_JMPREL
should be exposed byDynamicInfo
as optional information. We should make the type ofDynamicInfo.jmprel
anOption<usize>
instead.