wlav / cppyy

Other
387 stars 39 forks source link

Hide exposure to internally loaded namespaces #190

Closed kbladin closed 10 months ago

kbladin commented 10 months ago

Possible dupe of https://github.com/wlav/cppyy/issues/170, but I want to be a bit more specific.

All usage of cppyy seems to rely on the global cppyy.gbl namespace. I wonder if it would be possible to hide a given C++ namespace from the global one so as to not expose it to outside users of the Python library which internally use cppyy. Would it require its own interpreter?

Basically, if I write a Python library, I want to encapsulate all usage of cppyy and C++ behind a more pythonic API, without the C++ APIs I'm loading leaking out to the global namespace, and possibly without the users of the Python library necessarily knowing that cppyy is used in the background. I want to be able to selectively choose which parts of the C++ API I expose.

Is this possible?

wlav commented 10 months ago

It's not really possible to remove access through cppyy.gbl.YourNamespace. ... as that's fundamental to how cppyy works: exposing headers and symbols globally to the Cling interpreter. If the code is header only, you can wrap it some different namespace before including headers, but if anything needs to be linked, it has to match the headers.

However, in terms of a more pythonic library, there's nothing stopping you from never exposing cppyy anything from any user facing python module. All cppyy thingies are first-class Python objects, so you can do all the usual Python thingies with them.

For example, instead of handing out cppyy.gbl.SomeNamespace do e.g. MyPreferredName = cppyy.gbl.SomeNamespace and expose that. Or create a new object and set members to only those C++ pieces you want to expose. Or reimplement cppyy.gbl.SomeNamespace.__getattribute__ to filter things out. Etc., etc.

import cppyy

cppyy.cppdef("""\
namespace SomeNamespace {
void func1() { std::cout << "hello" << std::endl; }
void func2() { std::cout << "bye" << std::endl; }
}""")

mypackage = cppyy.gbl.SomeNamespace

def filterattr(self, attr):
    print("called!")
    if attr != 'func1':
        raise AttributeError(attr)
    return object.__getattribute__(mypackage, attr)

type(mypackage).__getattribute__ = filterattr

mypackage.func1()
mypackage.func2()
kbladin commented 10 months ago

Cool, that is in broad terms how I thought about the implementation as well. But even if cppyy is never direcly exposed to the API, a user will still be able to import cppyy and access cppyy.gbl on their own, for good or for bad, I suppose. I think it may be useful to think about the possibility to enable multiple c++ interpreters that don't talk to each other to give more control over API exposure to library developers.

wlav commented 10 months ago

I think it may be useful to think about the possibility to enable multiple c++ interpreters that don't talk to each other to give more control over API exposure to library developers.

Just to be sure, the multiple interpreters bit itself is possible (that's how CUDA support works: one interpreter in CUDA mode, one C++ mode), but nothing beyond (the whole lookup chain) is setup to handle multiple interpreters; and "don't talk to each other" is only on the parsing side: the linking side is still global as loaded shared libraries expose symbols globally. Additionally, certain classes should be shared regardless, e.g. all of STL, it'd be annoying/confusing on the Python side to have two classes for std.string for example.

The main use case for multiple interpreters that I would like to support is throw-away ones and that, unfortunately, is hard b/c of the linking. Another possible use case (which I'm not 100% convinced on, because it only works if all code uses such separation) is to separate out structured template libraries (e.g. eigen/dense and eigen/sparse) for lookup performance.

However, I'm not sure whether any of that helps with hiding things Python side: if your module has access through Python to C++ objects, even if you rename things to something unrelated or use _ to indicate that names are private, everything is still global in Python and can be found by anyone determined to "abuse" such access.