m4b / goblin

An impish, cross-platform binary parsing crate, written in Rust
MIT License
1.2k stars 160 forks source link

elf: symbol version utility APIs #282

Open johannst opened 3 years ago

johannst commented 3 years ago

After landing PR #280, which provides parsing of elf symbol version information, I wanted to discuss about higher-level utility APIs that would cover common use-cases.

In general I see the following use-cases:

I prototyped a bit here https://github.com/johannst/goblin/compare/johannst:master...johannst:elf-gnu-symbol-versioning-utils, but note that the find_version algorithm needs some

Please let me know what you think about providing such utilities and if you see other use-cases.

philipc commented 3 years ago

The main requirement is to look up the version information for a given version index. Instead of using iteration like in find_version, you may want to build a Vec, since I expect ELF producers will assign sequential version indices, and at least the glibc dynamic loader relies on this. e.g. I did this in another project.

I'm not sure of the value of providing VersionedLib. e.g. PyO3 maybe wouldn't use this because it currently uses a HashSet?

johannst commented 3 years ago

Instead of using iteration like in find_version, you may want to build a Vec ...

I wonder if it would make sense to offer both, in case you want to lookup a single version similar to the link in the second bullet item above, In that case you don't have to build the full Vec.

I don't see how the glibc dynamic loader relies on sequential version indices in the lines you linked. It seems to compute the max index and allocate an array with the corresponding size (like your implementation does). Or did you refer the "rely" on the allocation size, meaning you hope there is no bad version index?

I'm not sure of the value of providing VersionedLib. e.g. PyO3 maybe wouldn't use this because it currently uses a HashSet?

I manly choose a Vec to maintain the order, but I don't insist on using a Vec. We could also try to make the output generic over FromIterator, but using it might be not so convenient because of the nested iteration (need to tinker a bit, this just came to my mind while typing).

If computing this information is quiet special in every usage, then we should keep it to the user and not provide an utility here.

philipc commented 3 years ago

I wonder if it would make sense to offer both, in case you want to lookup a single version similar to the link in the second bullet item above, In that case you don't have to build the full Vec.

Sure. For that use, it doesn't make sense to me that you would want to check both verdefs and verneeds for the same symbol. You should already know which it might be in. So it might be more useful as separate methods on VerdefSection and VerneedSection to find a given Versym.

It seems to compute the max index and allocate an array with the corresponding size (like your implementation does). Or did you refer the "rely" on the allocation size, meaning you hope there is no bad version index?

Yeah, "rely" is probably too strong. If the version indices are random values in the 15-bit range then the algorithm is going to use a larger amount of memory.

If computing this information is quiet special in every usage, then we should keep it to the user and not provide an utility here.

It's hard to know based on one case. I don't have strong opinions about what to do here.

johannst commented 3 years ago

You should already know which it might be in. So it might be more useful as separate methods on VerdefSection and VerneedSection to find a given Versym.

Indeed, in that case you should already know in which section to look for. Having a lookup on each section sounds good to me. Additionally providing a Vec based approach similar to your VersionTable implementation should cover the different lookup cases.

I could continue to prototype a bit, but I also understand if you only want to add this sort of utility APIs if there is a "real" user.

It's hard to know based on one case. I don't have strong opinions about what to do here.

Maybe best is to wait and see some more uses, instead of providing something that's not useful to many :]