cling::evaluate leaks memory

This is related to #7187, but on review it is independent enough so on Vassil's suggestion, I'm opening a new bug report. Since use of cling::evaluate (underlying) seems to proliferate b/c of RDataFrame, it is also directly relevant to ROOT usage.

The issue is that although names such as TCling::Calc() and cling::evaluate suggest these can be used for dynamic evaluation of C++ code, they are in fact not insulated one-offs of evaluating the code in some context. They introduce code and leak (they even leak beyond the basic introduction of more code). There also does not appear any alternative that does provide the one-off behavior that the name Calc suggests.

Start on the ROOT side, with TCling::Calc (and ProcessLine etc.). If the Cling evaluation returns something non-void, a cling::Value is stored in fTemporaries, which is destroyed on application shutdown. This behavior means that a developer using these functions will see their code leak (and application crash if it runs out of memory), but leak checkers will report nothing. Pretty bad in itself, in particular since the developer can't do anything about it (there is no reset or access to pop_back on fTemporaries). A better implementation would transfer ownership responsibilities to the caller, if so desired. Also, the code only checks for valRef.isVoid(), whereas integer types with sizes smaller than that of long can be returned without a need to store the cling::Value.

(Aside, the final cast to long rather than to e.g. intptr_t means that this code breaks many applications on Win64.)

Next, cling::evaluate leaves a transaction behind. This is a good chunk of the leak, albeit not all. Using unload(1) one can get rid of the transaction, but there appears to be no way to indicate that the transaction should not be committed in the first place. The code also does not work with transaction RAIIs and there is no way to tell evaluate that the caller-side wants to take over the transaction.

Finally, beyond temporaries and left-behind leaks, there is still another leak (in fact, the major part) unaccounted for that I've been unable to determine. It's another one, like above, that gets cleaned out on application shutdown, so hard to track down. Heap checker suggests it is in the MachineFunctionPass, but I could not pinpoint it. Presumably more saved state.

The only thing I see fixable (and not breaking existing use cases) is the case for returning integer types. Details:

Unloading a transaction also unloads its generated code, AST, etc - that's violating a contract that existing uses rely on.
Calc etc return whatever cast to long, since CINT, return a reference (cast to long) for "objects". As you know that means the temporary must be stored until the end of the process. Bad interface, but it is what it is, and breaking the interface contract is out of question.
Memory allocated by MachineFunctionPass is likely the object code output of the JIT. It's a common use case to Declare() something and then Calc() say the address of a template instantiation - unloading the binary code after Calc() would again violate the usage contract.

I believe much of this issue comes from a misunderstanding, as you point out:

they are in fact not insulated one-offs of evaluating the code in some context

Would you like us to improve the documentation on those?

So what we're left with is adding new interfaces. We are considering employing nested interpreters for that (RDataFrame being a main customer), but due to person power constraints that won't happen in 2021.

I'll let you know once I implemented the optimization for integer returns - thanks for reporting that!

A new interface is perfectly fine. Right now, there is no way to run even an expression with zero side effects without leaking. But as to your first point, a failed transaction (such as the lookup of a type or function that does not exist) also leaks, and is even more common. Such a transaction should, by design, not have any side effects. These leaks affect any PyROOT user and it would not surprise me if ROOT I/O is hit by this, too, given the double lookups everywhere, with and without std::, with one of the two regularly failing.

As for your second point, that interface is broken on Windows 64b already and simply changing the return to intptr_t has a massive ripple effect, so if that platform is ever in the cards, you're going to have to deal with it anyway. Plus, a trivial augmentation would be to take a pointer to a value wrapper, which by default is nullptr. If given, return the value there, if not, keep the old behavior. That won't break any code, but will allow current clients to stop part of the leaking.

My preference would be to have such a new interface in Cling anyway, not on TInterpreter.

run even an expression with zero side effects without leaking

Indeed, see my comment on the child interpreter. clang is a compiler, cling adds to its translation unit - it's growing, and even 42 is source code that clang wants to remain in memory, corresponds to AST nodes, and object code that's emitted by the JIT. So whether something is "zero side effects" depends on who you talk to :-)

I'll let you know once I implemented the optimization for integer returns - thanks for reporting that!

That's actually already there, see https://github.com/root-project/root/blob/34fc1dcd8c12b49349c3e54c033effdb6f6354f0/core/metacling/src/TCling.cxx#L7636

a failed transaction (such as the lookup of a type or function that does not exist) also leaks

I bet that's the memory buffer and the bump allocator in the clang AST: for invalid code we don't codegen (so no leak there), the transaction gets recycled (no leak there either), and nothing gets run (no values leaking). I don't have plans to change the allocator for the clang AST (if that's at all "possible" i.e. "reasonably feasible"). @vgvassilev do you think that's something you might be interested in? With the AST nodes sticking around we also need the memory buffer that they point to (SourceLocation-wise).

a trivial augmentation would be to take a pointer to a value wrapper

We have TCling::Evaluate(const char* code, TInterpreterValue& value). Is that enough for you, or what is missing?

I actually remember trying Evaluate but it crashed. Didn't look into too many details as to why; it does have a couple of fewer "protection pieces" (such as the mutex lock) than does Calc(), so that may be the reason. But even then, I gave up on all that when I realized that Python readily outperforms. I mean, talk about optimization, even for your integer case and compare these two macro's:

int noleak() {
    int result = 0;
    for (int i = 0; i < 1000000; ++i) {
        result += (int)TPython::Eval("1+1");
    }
    return result;
}

and

int leak() {
    int result = 0;
    for (int i = 0; i < 10000; ++i) {
        result += (int)gInterpreter->Calc("1+1");
    }
    return result;
}

Please note the constant: the TPython::Eval() is called 100x more than TInterpreter::Calc() and it is still 4x faster on my box. So, my conclusion is that if you are in Cling and need to evaluate a run-time constructed expression represented in string-form, then the optimizily thing to do, is to call into Python ... (and if in a loop, then even if Python uses cppyy in turn, which uses Cling again, it will still outperform). And, of course, bonus points for not leaking ...

There is one case where cppyy still fails: if the operator+ in the expression above is a global friend (with the operands instances of some class), as so far, I've been unable to locate that method. So, I do still care, but my main problem is the leak on failed lookups, as those are very common. And I don't believe it is just the allocator, which as you tell it would re-use the memory, but I'll see first whether I can construct a pure Cling-only reproducer then.

Aside, if/when the lookup helper is inverted (i.e. being able to tell what some is rather than what it isn't), many of the failed lookups will simply go away.

root-project / root

cling::evaluate leaks memory #7201