wlav / cppyy

Other
400 stars 41 forks source link

Precompiled Headers for User Code #91

Open mkolin opened 2 years ago

mkolin commented 2 years ago

https://cppyy.readthedocs.io/en/latest/installation.html?highlight=precompiled#precompiled-header

Is it possible to enable precompiled headers for our own code? Load times can get up there in some cases and precompiling seems to be a good option for cutting them down.

wlav commented 1 year ago

To test whether there's a performance benefit, you can add the additional headers to cppyy_backend/etc/dictpch/allHeaders.h. Beyond that, yes, modules should support this (and NWChemEx is clamoring for them, so hopefully I get to write up an example soon, just that rn, I have a big conference coming up and am scrapped for time).

mkolin commented 1 year ago

Just an update on this. We have a large project and this work has sped up our import times very significantly. We can work with the hack for the time being.

I am hitting an issue, though and I suspect that it could possibly be solved with missing links via #pragma link. It seems like PCH can't find symbols for inline definitions of static members. I have a small example here:

#pragma once

#include <string>

namespace Test {

    struct D {
        static std::string empty;
        std::string s;

        D() noexcept : s(empty) { }
    };

    inline std::string D::empty{"empty"};

    inline const D X;
}

I can include this header into cppyy directly without any issue. When I try to build a pch with it, I get a missing symbol for _ZN4Test1XE (Test::X) and the pch creation fails. I can escalate this in the Root repo, but I wanted to check to see if I was missing something here first. Making Test::D::empty const fixes the issue, but I'd like to track down why PCH are failing with out changing anything.

KHIEM2812 commented 1 year ago

Fyi, i tried your header on my ROOT 6.26 environment but it did not show any errors. I believe it converted the script to C++ module instead of PCH. Here is my code in root

.L test.h+
std::cout << Test::X.empty << Test::D::empty;

From the 1st line, I compiled the script to ACLiC. The compilation output 3 files: test_h.d, test_h.so, test_h_ACLiC_dict_rdict.pcm. I deleted the 1st 2 files, kept *.pcm. Rerun the snippet above, ROOT can process your script just fine.

mkolin commented 1 year ago

Thanks for the quick reply.

A couple of things I should mention:

If cppyy supports pcm, I can give it a try. I've tried to change the rootcling pch parameters to build a pcm without luck, though. Any help with this would be appreciated.

Finally, looking through some of the root documentation, I found this: https://github.com/root-project/root/blob/master/README/README.CXXMODULES.md It lists improvements of pcm over pch inlcuding:

Improved correctness in number of cases -- in a few cases ROOT is more correct. In particular, when resolving global variables and function declarations which are not part of the ROOT PCH.

This may be my issue, so I feel like giving pcm files a try is a good next step.

wlav commented 1 year ago

ACLiC builds a shared library at run-time and loads it (which is also why it works in this case as the library will be available). I would not recommend it, though, as the generated library is linked with all currently loaded shared libraries, so the result depends on the ACLiC invocation. As for cppyy integeration, I've removed all . ROOT-isms.

Yes, it looks like it's an initialization issue by Cling. I can't think of any immediate workarounds.

I'm sitting on the patches to enable PCMs in the backend, and yes, those should resolve the matter as then the PCM and shared library that contains any symbols can be loaded in tandem. See attached file. Haven't really tested them yet: only ran some basic tests on Linux. pcm.txt

mkolin commented 1 year ago

Thanks for the help. I've applied your patch, but I'm getting stuck on a couple things.

I feel like the patch helped and I'm close. Any help would be appreciated. Thanks!

mkolin commented 1 year ago

Yes, it looks like it's an initialization issue by Cling. I can't think of any immediate workarounds.

If you have time, could you give me a rundown on what the issue is? I'd be interested in taking a look if possible. The PCH improvements are significant as we have a large codebase. It's a testament to how much of an impact cppyy has had.

Any help would be appreciated. Thanks!

wlav commented 1 year ago

I'm back to it, but trying to push out a release. This one will not be part of it: too invasive. :)

There are two ways of using PCMs, depending on how the software is deployed. ROOT likes to create their PCMs as part of the build process, but the Cling devs recommend to let the system create them automatically on use. I'm going for the latter as that's always how cppyy has dealt with the PCH.

The choice depends on whether the target hardware is homogeneous and/or whether you want take advantage of platform specific optimizations (this is not always a choice, e.g. if compiled libraries that are linked in have AVX enabled, it must also be enabled in Cling).

But in princple, it should be, once enabled, a simple matter of switching off the PCH and feeding -fmodules to Cling. I just don't know how it will deal with changing flags, changing source, etc. I need to do a lot more testing there.

mkolin commented 1 year ago

I'm back to it, but trying to push out a release. This one will not be part of it: too invasive. :)

I totally get it. No worries at all.

I think we're still liking the PCH approach right now anyway. Its simpler for us. I'd be happy to take a stab at the Cling PCH intialization issue, if only locally. I know you're busy, but if you have any insights into what might be happening, getting some info might be helpful. And if its too big to take on, I'm happy to pass on it and try PCMs again.

wlav commented 1 year ago

What I meant with "initialization issue" is the way that the PCH is treated. PCHs aren't indexable (PCMs are, which is usually faster/better, but not always e.g. with extensive template expressions that can be very slow), so Cling is loading it en-bloc. Apparently, it's doing more than just parsing, however, hence the missing symbol. I suspect that either a) dropping any symbol resolution or b) suppressing any linking errors from said resolution can be done without further consequences (as long as they're available on use).

I haven't looked into where this occurs, in particular since symbol lookup in Cling isn't pretty (bc Cling will call itself recursively to allow symbol generation of failed lookup). But if you want, I'd start in IncrementalJIT.cpp and in particular getSymbolAddress(). That should be called, and I'd work up from there to find the exact part of the initialization that can be skipped. It may be that Cling is doing the resolution to find some callbacks, but that logic can be inverted: explicitly look for the callbacks, rather than forcing resolution.

vgvassilev commented 1 year ago

@mkolin let's revive this issue: in your example, what version of C++ you are using? Is that C++17 or later? In that case inline variables make sense and we are probably hitting some big and not a design issue.

My recommendation would be to make the PCH open-ended. That is, building it at deployment time allowing users to add hot header files which are specific to their use-case.

PS: Modules will arrive with our major rework of cppyy to use only LLVM.