Open matthijskooijman opened 4 years ago
Thanks! I'll try to go through your post one by one.
BTW, about docs, as you are both interested in using such a tool & improving it, you are talking about both. But it'd be much better to address these separately; so that developer docs/discussion is separate from user docs. And I should confess that I've tried much more to provide a good API for the end user, than to clean up the internal implementation (and so, naming of the internal stuff might be worse than that of user API).
If we're messing with object files and linker flags, do we still need -Wl,--wrap? Couldn't some fiddling with --defsym and/or --redefine-sym achieve the same thing, but with simpler code? I haven't thought this through yet, though.
Well, currently the only object file which is modified (or --defsym is used for) is the one(s) with WRAP_FUNCTION...() calls, usually just wrap.cpp. No other files are touched. And I really prefer to limit its use as much as possible. Ideally, I prefer to do most things in C++ as much as possible.
Actually, one of the things I'm thinking to look into is to see if I can generate a header file which includes final names rather than using --redefine-sym/--defsym; so that we don't bother with object files at this step even more.
... weak symbols..
About weak symbols, well we are discussion in issue #2, but briefly: I don't like it to be used to replace --wrap, but if we can come up with a clean solution to use it for functions in the same TU, it can be looked into seriously.
I don't really like that the bindfakes* tool that is run during compilation actually links against the main lib (corelib) code, since that might cause application/lib code to be ran during build (i.e. global constructors). I guess this is needed to allow including wrap.cpp in the link, but maybe: ...
Hmm... it was certainly needed at some point, and I never thought that it might be a problem. Yeah, it might be possible to fix. I'll try to look into it soon, and if it seems possible, either fix it or create an issue for it. I prefer to not try to inspect it externally; and I hope it is possible.
Terminology is a bit messy here and there. Terms like "Fake", "Wrapper", "Wrapper object", "Prototype" seem to be used slightly inconsistent and not entirely self-explanatory (but maybe that's just my lack of understanding). I believe some improvement is possibly here, though I don't understand things well enough to propose specific changes yet.
Yeah, that is very likely. Specially since the code has been gone through a number of changes which affected the naming. But I suggest to discuss user facing API and internal API (most of which is in internal namespace). User API is much more important, and I hope it looks good. But if you have any suggestions, it'd be great. Internal API, well you are probably right. I'll try to read it again with a fresh eye, but I certainly like to see how someone else thinks and if there are suggestions.
But it'd be much better to address these separately; so that developer docs/discussion is separate from user docs. And I should confess that I've tried much more to provide a good API for the end user, than to clean up the internal implementation (and so, naming of the internal stuff might be worse than that of user API).
Yeah, this is a good point, but I'm still in the "trying to understand it all" phase, where anything helps :-p
Well, currently the only object file which is modified (or --defsym is used for) is the one(s) with WRAP_FUNCTION...() calls, usually just wrap.cpp. No other files are touched. And I really prefer to limit its use as much as possible. Ideally, I prefer to do most things in C++ as much as possible.
Good point, I think I'd agree there (except that maybe, to allow same-source-file-calls, all object files need to be modified as suggested in #2, but it would be good if that stays a separate and optional step).
Actually, one of the things I'm thinking to look into is to see if I can generate a header file which includes final names rather than using --redefine-sym/--defsym; so that we don't bother with object files at this step even more.
Oh, that's an interesting thought indeed. So you'd compile wrap.cpp
once to gather info, write out a .h file with the needed naming and then compile it again to actually declare the needed stuff. I like that thought.
I prefer to not try to inspect it externally; and I hope it is possible.
Why not? To me, doing things external means you could put things in an external, generic tool, and never have to actually run any "user" code as part of the build process (not even just the wrap.cpp
, which could be ok, though).
(PS You're missing some empty lines after quotes in your previous message, so some of your replies are shown as quotes now)
Thinking on this for just a little more, I wonder about the generation of names. I suspect that the central challenge here is to convert a name and/or reference to some C++ function into the actual linker filename that it will be using, right?
SampleClass::CallThis
to e.g. WRAP_FUNCTION()
and then convert that to __wrap__ZNK8FakeTest11SampleClass8CallThisEv
so you can define the wrapper function by that name.objcopy
or --defsym
to rename the symbol to the right name.Looking at the code, I see that it also inspects the original (corelib) object files for the names in there. Is this needed because it is impossible to fully predict the mangled names based on the original names? So you need to (somewhat fuzzily) cross-match against the actual names to figure out what actual names have been chosen?
If so, then maybe it would be sufficient to just inspect the wrap.o
after all, since that actually also contains the referenced symbols as undefined references:
nm sample/libwrap_lib.a |grep CallThisEi
U __real__ZN8FakeTest12SampleClass28CallThisEi
0000000000000000 W __wrap__ZN8FakeTest12SampleClass28CallThisEi
U _ZN8FakeTest12SampleClass28CallThisEi
(note that this is post-renaming, so __real_
and __wrap_
are already renamed, but the original U symbol was already present before the renaming).
Or is there something else (other than symbol names) that you're using from the corelib object files?
Note that this approach might somewhat complicate my original suggestion of not linking against the corelib for the command-to-be-executed at build time, since the above only works when it does reference corelib, which then also (I think) requires linking against corelib. So then you'd get three versions of the wrap.o file:
Using an external tool to inspect wrap.o
, rather than executing it, could maybe remove the need for 2., though. Or maybe some making these weak undefined references (I think that's a thing) could allow linking the command-to-be-executed without actually linking to corelib, so 1. can be used instead of 2.? Or maybe some "ignore undefined references" linker option for that command-to-be-executed only (though that's likely to mask other, serious, linker errors).
Actually, one of the things I'm thinking to look into is to see if I can generate a header file which includes final names rather than using --redefine-sym/--defsym;
I wonder if, rather than recompiling wrap.cpp
with a generated header, you could somehow add another .o file into the link that handles the renaming. I've read things about "indirect symbols", which seem to be essentially symbols pointing to other symbols, that might serve this purpose, but I'm not sure how to generate these yet (maybe adding a linker script as one of the "objects" to link could work, though that is essentially what --defsym
also does, I think, or maybe something with gcc's alias
attribute, though that probably requires knowing the exact type of the alias).
If you happen to remember why --defsym
didn't always work, that might be useful input here as well :-p
Thinking on this for just a little more, I wonder about the generation of names. I suspect that the central challenge here is to convert a name and/or reference to some C++ function into the actual linker filename that it will be using, right?
* Ideally, you'd just pass a fullly-qualified name (e.g. `SampleClass::CallThis` to e.g. `WRAP_FUNCTION()` and then convert that to `__wrap__ZNK8FakeTest11SampleClass8CallThisEv` so you can define the wrapper function by that name. * Maybe this conversion can be implemented using C++ templates and constexpr functions, but you need to _name_ your function using this result, which AFAIK cannot be done in C++. So, you'll need to build the _name_ of this wrapper using the preprocessor, which pretty certainly will not be able to replicate the mangling needed. * So, instead, you generate an arbitrary but unique name, store the original qualified name as a string, and execute the generated code to access the original name so you can generate the mangled name and use `objcopy` or `--defsym` to rename the symbol to the right name.
Yes, that's right (Sorry for no documentation :( ). If there were a method to generate mangled names at compile time, all things would be much simpler. The whole bind_fakes process is to match mangled names of functions to actual functions and provide their names to wrap.o. Or if ld supported C++ directly so we didn't need mangled names...
Looking at the code, I see that it also inspects the original (corelib) object files for the names in there. Is this needed because it is impossible to fully predict the mangled names based on the original names? So you need to (somewhat fuzzily) cross-match against the actual names to figure out what actual names have been chosen?
If so, then maybe it would be sufficient to just inspect the
wrap.o
after all, since that actually also contains the referenced symbols as undefined references: ...Or is there something else (other than symbol names) that you're using from the corelib object files?
Well, my poor documentation again :( even at the commit level. I should dig to see if I've said something in commit messages, but what I remember is that while wrap.o includes most of the symbols, sometimes it doesn't include all of them. I'm pretty sure that I've tried inspecting just wrap.o before. I must document even my failed attempts :(
Maybe it was needed for mingw. I've taken some steps to support mingw too, but it not complete IIRC! For example, the --leading-underscore
option is specifically for how mingw mangles function names. I'm not sure this part is related to mingw though. Well, probably it should be tested again. I've a number of projects in which I've used powerfake, so I'll try them soon and see if I can see when wrap.o doesn't provide needed symbols.
Note that this approach might somewhat complicate my original suggestion of not linking against the corelib for the command-to-be-executed at build time, since the above only works when it does reference corelib, which then also (I think) requires linking against corelib. So then you'd get three versions of the wrap.o file:
1. With references to corelib, but with temporary names. Intended to be inspected and modified. 2. Without references to corelib, still with temporory names. Intended to be ran. 3. With references to corelib but with the final names. This is 1. with modifications, or 1. recompiled with the header-with-names you suggested.
Using an external tool to inspect
wrap.o
, rather than executing it, could maybe remove the need for 2., though. Or maybe some making these weak undefined references (I think that's a thing) could allow linking the command-to-be-executed without actually linking to corelib, so 1. can be used instead of 2.? Or maybe some "ignore undefined references" linker option for that command-to-be-executed only (though that's likely to mask other, serious, linker errors).
I can't comment on this right now. Should spend some time for it. I'm thinking it might be possible for wrap.cpp to not actually reference functions in other libraries when compiled for bind_fakes (Needs two separate compilation of wrap.cpp, but I think that is OK). But Im not sure if a completely generic binary is possible. Because we should match temporary names with actual names and mangled names. Should look into it, maybe we can find some solution. BTW, I'm mainly thinking about not linking with corelib, rather than being generic; although a generic tool might be better specially if it doesn't increase dependency on external tools very much.
Actually, one of the things I'm thinking to look into is to see if I can generate a header file which includes final names rather than using --redefine-sym/--defsym;
I wonder if, rather than recompiling
wrap.cpp
with a generated header, you could somehow add another .o file into the link that handles the renaming. I've read things about "indirect symbols", which seem to be essentially symbols pointing to other symbols, that might serve this purpose, but I'm not sure how to generate these yet (maybe adding a linker script as one of the "objects" to link could work, though that is essentially what--defsym
also does, I think, or maybe something with gcc'salias
attribute, though that probably requires knowing the exact type of the alias).If you happen to remember why
--defsym
didn't always work, that might be useful input here as well :-p
That'd be great too. Yeah, I also thought about link script, but when --defsym didn't work, I also get unhappy. (Well, I also remember experimenting with link scripts... but not documented as expected :D ) About --defsym, well I don't know why it didn't work! What I don't remember is when it didn't work. I'll try to experiment again with it and document the results for future :)
In the latest version, it is now possible to build bind_fakes as an generic & independent tool when using "standalone mode". However, it is still experimental, and in some situations it might need users help to be able to match symbols with actual functions, since the name available at compile time doesn't always match the demangled name of a type.
Hey @hedayat, nice codebase you made here :-)
I'm looking to use Powerfake in a project of mine, it looks like an ideal way to do testing (or in my case, also simulation of code that normally runs on a microcontroller) without requiring invasive changes in the original codebase.
Since the project lacks documentation, I dove in a bit to figure out how things work, which could maybe lead to a contribution of documentation. However, while digging, I found some things that could maybe be improved too. I've listed them below, along with my notes about how things work (which are more notes than proper documentation, but might be useful for others already).
Improvement ideas
If we're messing with object files and linker flags, do we still need
-Wl,--wrap
? Couldn't some fiddling with--defsym
and/or--redefine-sym
achieve the same thing, but with simpler code? I haven't thought this through yet, though.Currently, function calls in the same compilation unit as the definition are not wrapped. It seems that using
objcopy --globalize-symbol
and--weaken-symbol
can be used to work around this limitation: https://stackoverflow.com/a/46103766/740048 (though this seems to use weak symbols instead of--wrap
, so is essentially the same as the previous point). To also keep access to the__real_
symbol, it might need to be manually renamed: https://stackoverflow.com/a/60604263/740048. All this is nice for simple examples, but might end up a lot more complicated on a real-world project, especially if the original codebase also usesweak
functions (which, I think, might be handled more gracefully using--wrap
).I don't really like that the
bind_fakes_*
tool that is run during compilation actually links against the main lib (corelib) code, since that might cause application/lib code to be ran during build (i.e. global constructors). I guess this is needed to allow includingwrap.cpp
in the link, but maybe:wrap.cpp
can be compiled twice, once where it only uses the name and type of the wrapped function, without also storing its address, so it can be linked intobind_fakes_*
without also having to link corelib?wrap.o
can be inspected externally using e.g.objdump
ornm
, rather than being included in the link? This would mean that the "generate linker flags" binary could be a more generic tool (not tied to a particular project), which could also clean up the build process and make it a lot easier to embed into another project. This might need some additional symbols to be generated (i.e. global string variables instead of / in addition to the current arguments to theWrapper
constructor) to make it easier to inspect the object file.Terminology is a bit messy here and there. Terms like "Fake", "Wrapper", "Wrapper object", "Prototype" seem to be used slightly inconsistent and not entirely self-explanatory (but maybe that's just my lack of understanding). I believe some improvement is possibly here, though I don't understand things well enough to propose specific changes yet.
Notes for documentation
A project using powerfake consists of three parts:
The original code. This remains standalone, without a dependency on the other parts.
In the sample, this is
functions.cpp
andSampleClass.cpp
. In the sample, this is calledcorelib
, in thebind_fakes
cmake function, this is calledtest_lib
.The wrapper code. This part defines which functions must be wrapped and defines appropriate wrapper objects. This depends on the original code.
In the sample, this is
wrap.cpp
. In the sample, this is calledwrap_lib
, in thebind_fakes
cmake function, this is calledwrapper_funcs_lib
.The testing code. This defines the actual tests, and uses the wrapper code to selectively replace functions in the original code when needed. This depends on both other parts.
In the sample, this is
faked.cpp
. In the sample, this is calledsamples
, in thebind_fakes
cmake function, this is calledtarget_name
.Each of these three parts are initially compiled independently.
Build process
The build produces these:
libpowerfake.a
(containspowerfake.cpp
)libpw_bindfakes.a
(containspowerfake.cpp
,NMSymbolReader.cpp
,SymbolAliasMap.cpp
,Reader.cpp
andbind_fakes.cpp
, with-DBIND_FAKES
)libcorelib.a
(containsfunctions.cpp
,SampleClass.cpp
)libwrap_lib.a
(containswrap.cpp
)bind_fakes_samples
(containsdummy.cpp
,libpw_bindfakes.a
,libwrap_lib.a
(twice?),libcorelib.a
,libpowerfake.a
(needed?))powerfake.link_flags
, produced by runningbind_faked_samples libcorelib.a libwrap_lib.a
samples
(containsfaked.cpp
,libwrap_lib.a
,libcorelib.a
,libpowerfake.a
, linked usingpowerfake.link_flags
).bind_fakes_samples
:nm -po libcorelib.a
, or with--passive
reads from the passed files directly (which should benm
output rather than .a files then).--leading-underscore
, leading_
are stripped from all symbol names.nm
and does some more processing on them, and then matches them to the names passed to the e.g.WRAP_FUNCTION
macros (also after some processing). These latter names are taken from the map kept by theWrapperBase
class, which is available since libwrap and corelib are both linked into this executable.-Wl,--wrap=...
option with the (mangled) name of the original symbol topowerfake.link_flags
.libwrap_lib.a
__real_...
to match what ld expects. Similarly for each wrapper (should be defined) symbol in there. This either usesobjcopy --redefine-symbol
to actually modify the .o file, or adds-Wl,--defsym
topowerfake.link_flags
to let the linker do this at link time.C++ Code
WRAP_FUNCTION(...)
defines a "wrapper object" and (when!BIND_FAKES
) a "wrapper function" too (though it seemswrap.cpp
is always compiled without-DBIND_FAKES
, so the#indef BIND_FAKES
inpowerfake.h
is probably useless).The "wrapper object" is an instance of
PowerFake::internal::Wrapper
, namedPowerFakeWrap_alias_<lineno>
(first part should be modified when using different files).The type of the function is wrapped in a
PowerFake::internal::FuncType
for some reason.This object essentially just (optionally) stores a function pointer to relace a particular function. The
PowerFake::internal::WrapperBase
additionally keeps a registry of all instantiated Wrapper objects, which can be looked up based on type and address, which is used by MakeFake to return a Fake object for the right Wrapper.WrapperBase also has a "WrappedFunctions" prototype registry, but that seems to be only used (and filled) when
BIND_FAKES
(though it is always declared, for no real reason it seems).The "wrapper function" is a function to replace the original function using
-Wl,--wrap
. It basically just looks up the "wrapper object", if it is callable / set, calls that, otherwise calls the original function. This uses a temporary name for the wrapper function and real function, since the__wrap_
and__real_
prefixes must be applied to the mangled names, not the original names, so this happens in a post-processing step later. Also, this wrapper function is declared inside a class template with two versions of which only one is instantiated, seemingly to support both regular function and member functions.