martin-olivier / dylib

C++ cross-platform wrapper around dynamic loading of shared libraries (dll, so, dylib)
https://conan.io/center/recipes/dylib
MIT License
293 stars 44 forks source link

Support non-extern-C symbols #27

Open eyalroz opened 2 years ago

eyalroz commented 2 years ago

This is a C++ library for working with shared objects, but it only supports unmangled, C-style functions. That means it doesn't serve its primary function. The library must support any C++ function one can load from a shared object. Naturally, this is ABI-specific, but that's either for the user to configure and build accordingly, or potentially a case for multi-ABI support. The latter is much more complicated, and would be a feature request in itself, but function symbols should definitely be looked up by their mangled name, if they're not extern-C.

martin-olivier commented 2 years ago

Current status

Hello, I'm currently working on that. The goal is to add a feature to dylib to be able to load c++ symbols

Linux and MacOS

Variables I can now access a mangled variable within a namespace :

dylib lib("lib.so");
auto ver = lib.get_variable<double>("driver::infos::version");

Functions To be able to mangle functions within a namespace, or / and in an overload situation, i need to have access to each function parameter types. But currently, the template you need to specify to get_function is the following :

dylib lib("lib.so");

// get_function<T> for T = [module *(const char *)]
auto mod = lib.get_function<module *(const char *)>("driver::factory");

To be able to iterate over variadic template arguments, i temporally replaced the current syntax with the following one :

// old syntax
// get_function<T>
auto mod = lib.get_function<module *(const char *)>("driver::factory");

// temporary new syntax
// get_function<Ret, Args...>
auto mod = lib.get_function<module *, const char *>("driver::factory");

Do you know if there is a way to "decompose" a function template argument to get its return value as Ret and its arguments as Args... ?

Windows

TODO

martin-olivier commented 2 years ago

Update

Linux and MacOS

Variables I can now access a mangled variable within a namespace :

dylib lib("lib");
auto ver = lib.get_variable<double>("driver::infos::version");

Functions I can now access a mangled function within a namespace with any types of arguments :

dylib lib("lib");

auto mod = lib.get_function<module *, const char *>("driver::factory");
auto set_inst = lib.get_function<void, module &&>("driver::instance::set");
auto print = lib.get_function<void, std::ostream &, const std::string &>("driver::tools::print");

Windows

TODO (Next step)

Question

Do you know if there is a way to "decompose" a function template argument to get its return value as Ret and its arguments as Args... ?

eyalroz commented 2 years ago

Do you know if there is a way to "decompose" a function template argument to get its return value as Ret and its arguments as Args... ?

Well, std::result_of for the return type; and you can use this hack for the parameters.

But are you sure you're not going about this the wrong way? I mean, take the function's proper type, then apply name mangling (not yourself - there's an ABI library for that), then look for the symbol.

martin-olivier commented 2 years ago

But are you sure you're not going about this the wrong way? I mean, take the function's proper type, then apply name mangling (not yourself - there's an ABI library for that), then look for the symbol.

This is what I'm doing but i'm not sure there is an abi lib to mangle names (i'm currently using typeid(T)::name() to apply mangle)

There is this abi function to demangle a symbol but i didn't see anything about re-mangling:

char *demangledName = abi::__cxa_demangle(av[i], NULL, NULL, &status);
eyalroz commented 2 years ago

Ah, right, abi:: is just for demangling. typeid(T)::name() doesn't need an extra library; but then - it doesn't mangle names in the sense of getting you the symbol name to look for in an object.

eyalroz commented 2 years ago

Also, this may be relevant for Windows.

martin-olivier commented 2 years ago

typeid(T)::name() doesn't need an extra library; but then - it doesn't mangle names in the sense of getting you the symbol name to look for in an object.

You are right, to do so, i made the following code to have at the end the accurate function symbol mangled name in all situations (except pointers and namespaces) on unix :

    template <typename T, typename U, typename... Args>
    static std::string TemplateMangle()
    {
        return TemplateMangle<T>() + TemplateMangle<U, Args...>();
    }

    template <typename T>
    static std::string TemplateMangle()
    {
        std::string t = typeid(T).name();
        if (std::is_lvalue_reference<T>::value) {
            std::string tmp = "R";
            if (std::is_const<typename std::remove_reference<T>::type>::value)
                tmp += 'K';
            t = tmp + t;
        }
        else if (std::is_rvalue_reference<T>::value) {
            std::string tmp = "O";
            if (std::is_const<typename std::remove_reference<T>::type>::value)
                tmp += 'K';
            t = tmp + t;
        }
        return t;
    }

    template<typename ReturnType, typename Arg1, typename ...Args>
    static std::string mangle_function(const std::string &name) {
        return "_Z" + std::to_string(name.size()) + name + TemplateMangle<Arg1, Args...>();
    }

    template<typename ReturnType>
    static std::string mangle_function(const std::string &name) {
        return "_Z" + std::to_string(name.size()) + name + typeid(void).name();
    }
eyalroz commented 2 years ago

Let me first note I've asked about this at StackOverflow.

Now, for your implementation.

martin-olivier commented 2 years ago

I think all of this code should be made constexpr - since it's all information that we know at compile-time. The T and U in one of your function variants are ambiguous. Give them more specific names? I suggest we don't use std::string's, but rather a string_view (or a char* and size_t pair in C++11) as the target buffer. We have rather expensive string concatenations in our code, that's true - but that only happens when handling errors. ... actually, we may want to have a "poor man's span" structure with just those two fields Same point about the inputs. So, something like: template manglefunction(dylib::detail::poor_span mangledname, dylib::detail::poor_span function_name)` TemplateMangle - what exactly does it mangle? It seems like it mangles the name of a type, right? Then better call it mangle_type(). Or perhaps just mangle(). Don't use the same string literal in multiple places.

You're right, but currently I prefer to focus on making the proof of concept work

Have you checked this against the Itanium ABI document to make sure it's valid? That should also get you going with namespace and pointers.

Yes, i'm using this document to implement the feature

What about mangling a variable?

The following code mangles namespaced varibles on unix :

class dylib { 
private:
    static std::vector<std::string> string_to_vector(const std::string &str, const char *delimiters) {
        std::vector<std::string> tokens;
        std::string::size_type lastPos = str.find_first_not_of(delimiters, 0);
        std::string::size_type pos = str.find_first_of(delimiters, lastPos);
        while (std::string::npos != pos || std::string::npos != lastPos) {
            tokens.push_back(str.substr(lastPos, pos - lastPos));
            lastPos = str.find_first_not_of(delimiters, pos);
            pos = str.find_first_of(delimiters, lastPos);
        }
        return tokens;
    }

    static std::string mangle_variable(const std::string &name) {
        if (name.find("::") == std::string::npos)
            return name;
        auto ns_list = string_to_vector(name, "::");
        if (ns_list.size() == 1)
            return ns_list.front();
        std::string mangled = "_ZN";
        for (auto &ns : ns_list)
            mangled += std::to_string(ns.size()) + ns;
        return mangled + 'E';
    }
}
eyalroz commented 2 years ago

I think you may be misusing the delimiters parameter... it takes several chars, each of which is a delimited.

martin-olivier commented 2 years ago

I'm gonna release 2.0.0 without this remangling feature since i dont have many time to work on that actually.

eyalroz commented 2 years ago

@martin-olivier : There's always version 3.0...

eyalroz commented 2 years ago

Here are some outstanding SO questions about doing this:

eyalroz commented 2 years ago

Good news - here's MSVC mangling code for you:

https://godbolt.org/z/nnW19qzYE

Right now, that code requires C++20, but with a little work you can bring that down to C++11 and integrate it into yur own code.

ericoporto commented 4 months ago

Good news - here's MSVC mangling code for you:

https://godbolt.org/z/nnW19qzYE

Right now, that code requires C++20, but with a little work you can bring that down to C++11 and integrate it into yur own code.

Hey, did anyone did that "little work"? I kinda need it to build in some old compilers. :/

stellarpower commented 2 days ago

Don't know if this is of any help. When demangling the other way, I always use boost. There is also a nice static type_info that uses a string_view and the __PRETTY_FUNCTION__ macro in order to extract the unmangled names. There a few around and can't remember which I used but here is one of them. Either way I'd probably buy not build and would've thought a compiler/support library builtin must be able to mangle names. Last time I looked at libsupcxx many years ago I think I saw one.

Personally though I'd probably rather use the compiler and register something into my library. I appreciate that only works for some usecases though, where you can modify the sources and you're generating something more like a plugin, rather than just very late binding of an arbitrary function. Let the compiler do that work and pull symbols in using something more like a factory with my own key.