Closed jessehui closed 3 years ago
Cool! Yea this seems like a great compromise to get a lazy parsing approach, I like it! Reducing some of the boilerplate into common function would be good, as you said, but i understand that this is just an initial sketch/PR?
What api were you thinking for filling in the rest of the structures, e.g., section headers, etc.?
And thanks for the PR! :)
Thank you for your reply. Glad you are interested. I push a new version to reduce the repeated code, make API more general and add a unit test which can also be an example.
What api were you thinking for filling in the rest of the structures, e.g., section headers, etc.?
With goblin's well-designed APIs for each part (program headers, section headers ...), user can read in the contents they need and parse the specific part with corresponding parse
function. And then, they can put it to the dummy Elf
struct returned from lazy_parse
. I think the unit test can be seen as an example (here).
We will do the parsing similar for the libOS. Just we only care about PT_INTERP and PT_LOAD segments.
You may start the review. Thank you!
I second this.
In my specific use case, I just need to parse the elf file header in order to get e_ident[EI_CLASS]
and e_machine
.
It would be a lot nicer if there was a straightforward way to do that (https://github.com/m4b/goblin/pull/254/files#diff-696bbf0d8b3f75177e0fa931a45c3e85057e2078598ee1cf8040393730b30633R206), instead of include scroll
as a direct dependency.
It seems that I can't fill in all parts without gnu_hash_len
and/or hash_len
...
However, I think if users would like to go this far, maybe it is better to call parse
instead of lazy_parse
as it will need a lot more effort to parse almost everything.
But for users who just need some specific sections or segments, the lazy_parse
can bring much convenience. I can write an example to extract the interpreter segments lazily (without reading the whole elf file). What do you think?
Example getting interpreter would be great! Re gnu hash, let's leave it private for now, that way we don't commit to an api; if someone needs this functionality, we can always make a PR/take a PR and review how best to expose it.
let mut num_syms = if let Some(gnu_hash) = dyn_info.gnu_hash {
gnu_hash_len(bytes, gnu_hash as usize, ctx)?
} else if let Some(hash) = dyn_info.hash {
hash_len(bytes, hash as usize, header.e_machine, ctx)?
} else {
0
};
let max_reloc_sym = dynrelas.iter()
.chain(dynrels.iter())
.chain(pltrelocs.iter())
.fold(0, |num, reloc| cmp::max(num, reloc.r_sym));
if max_reloc_sym != 0 {
num_syms = cmp::max(num_syms, max_reloc_sym + 1);
}
dynsyms = Symtab::parse(bytes, dyn_info.symtab, num_syms, ctx)?;
as far as i can see, this is only code that uses it; we could just a a helper function that does all this and takes whichever minimal arguments requierd, seems like dyn_info, ctx, bytes, and dyn_relas and header
:shrug:
For now i think it's better to leave private, as someone can always do as you say and parse whole thing :)
I can add this change onto the 0.3 branch and do (another :D ) release, 0.3.4 if you like?
@m4b Thank you for your review. Yeah sure, I was just going to ask whether there will be a recent release with this feature that can be used directly from crates.io?
You got it, 0.3.4 is out!
Great! Thank you very much!
We are developing a libOS kernel (https://github.com/occlum/occlum) and would like to load ELF as lazy as possible. However, now goblin does all the read and parse in memory so we must read the whole ELF file to kernel memory (we can't mmap the file to memory due to some constraint).
Thus, I came up with this method to load lazily: read ELF header only -> parse header -> get program header table -> only read in PT_LOAD and PT_INTERP segments
And by adding these two functions, I can achieve this. The implementation can be found here. I would like to see this get merged to master so that people with similar requirements can do this. But I am not sure if it is a common need. If it is, I can spend some time to reduce the repeated code.
Comments are welcome. Thank you.