wlav / cppyy

Other
391 stars 40 forks source link

python binding and declare shadowing issue in cling #112

Closed KHIEM2812 closed 1 year ago

KHIEM2812 commented 1 year ago

I am using Ubuntu20 and install cppyy with conda. Here is my environment setup script

export ENV_NAME=cppyy
conda create -y --name $ENV_NAME python=3.8 && \
conda activate $ENV_NAME && \
conda install -y -c conda-forge cppyy && \
conda install -y ipykernel

In jupyter python, i ran the following script

cpp_code = """
int myvar = 100;
"""
cppyy.gbl.gInterpreter.ProcessLine(cpp_code)
print("myvar version1 = ", cppyy.gbl.myvar)

cpp_code = """
int myvar = 200;
"""
cppyy.gbl.gInterpreter.ProcessLine(cpp_code)
print("myvar version2 = ", cppyy.gbl.myvar)

The output is

myvar version1 =  100
myvar version2 =  100

In common sense, I expect myvar version2 = 200 though it is not. I was very happy with the relaxing c++ one definition rule in cling when developing my library. However, it seems that cppyy python binding has some limitations and it cannot work well with the cling feature. Can you help to give me a more detailed explanation of the code execution above?

wlav commented 1 year ago

Cling has an easier job here than does cppyy: it receives the new declaration and just needs to ignore the error that the compiler would otherwise give and doesn't actually update anything, just ensure that any new JITed code will use the new declaration from there on, which is done by having it take precedence in the symbol lookup.

To see this, take a pointer to the first version of myvar: int* pmyvar = &myvar;, then redeclare it, and print it: std::cerr << *pmyvar << std::endl. You will find that it prints the old value.

On the Python side there is no such "redeclaration" happening or any other notification occurring. Iow., the only way that cppyy could figure out that the value has been redeclared, is to redo the full lookup. Doing that on every access, however, is prohibitively expensive, esp. since it's uncommon. (It also wouldn't work as implemented, because Cling emits these symbols lazily, i.e. it would have to be redeclared and used before it could be available.)

Furthermore, since cppyy holds not myvar, but &myvar (such that assignment to myvar does propagate) and Cling didn't actually change that variable upon redeclaration, it's not even clear that it should update. The same way that the original variable does not update in prior JITed code either.

The best workaround I can think of, and which might work, is a full re-lookup after something like del cppyy.gbl.myvar, but again, it would require the symbol to have been emitted, or it will just find the old one again.

KHIEM2812 commented 1 year ago

It is a very helpful explanation. Cling should delete the old definition, instead of just shadowing the old version. Although, I haven't fully grasp the idea of emitting symbols lazily, is it necessary for cppyy to do full symbol lookup? For example, in the following macro:

import cppyy
import cppyy.ll as ll
import random

def gen_random_var_name():
  return "_random_seq_" + str(random.randint(0,1e32))

def print_myvar_proxy():
  varName1 = gen_random_var_name()
  addr1 = hex(cppyy.gbl.gInterpreter.ProcessLine("auto* {temp1} = &myvar;{temp1};".format(temp1=varName1)))
  print("addr1=",addr1)
  addr2=hex(cppyy.addressof(cppyy.gbl.myvar))
  print("addr2=",addr2)

  if addr1 != addr2:
    print("Error: require update python symbol")
  else:
    print("myvar latest version = ", cppyy.gbl.myvar)

# I have to change from the type `int` to `std::vector` so that i can use `addressof` method.
cpp_code = """
std::vector<int> myvar(1);
std::cout << "myvar@" << &myvar << "=" << myvar.size() << std::endl;
"""
cppyy.gbl.gInterpreter.ProcessLine(cpp_code);
print_myvar_proxy()

cpp_code = """
std::vector<int> myvar(2);
std::cout << "myvar@" << &myvar << "=" << myvar.size() << std::endl;
"""
cppyy.gbl.gInterpreter.ProcessLine(cpp_code);

print_myvar_proxy()

And the output on my computer

addr1= 0x7f87fad61020
myvar@0x7f87fad61020=1
addr2= 0x7f87fad61020
myvar latest version =  <cppyy.gbl.std.vector<int> object at 0x7f87fad61020>
myvar@0x7f87fad61040=2
addr1= 0x7f87fad61040
addr2= 0x7f87fad61020
Error: require update python symbol

The idea is that cppyy can send a query for the object address to cling. If the object address is updated, it can confirm the shadowing occurring and update accordingly. Can the idea help to detect such a nuance reliably? so next time, I can tell if i make the same mistake again.

KHIEM2812 commented 1 year ago

I just found this bitbucket issue https://bitbucket.org/wlav/cppyy/issues/240/q-redefinitions, post here for reference

wlav commented 1 year ago

Using a transaction like in the code above is even slower than doing a full lookup (by definition, Cling does such a lookup to resolve myvar in the expression, so that's a given) and leaks memory. ProcessLine() is extremely expensive. E.g. doing a simple 1+1 on ProcessLine() is about four orders of magnitude slower than doing the same in Python. It leaks memory because all transactions are recorded: there are no "one-off" transactions that can be dismissed after completion. That's an old request of mine to upstream.

The cheapest "query" would be a lookup in the JIT symbols map, which is part of the full lookup, and IIRC, not separately exposed by Cling. Even if it were, though, it's still way more expensive than dereferencing an already known pointer.

The idea of a high water mark as in that bitbucket issue will help a little bit: it's unlikely that the mark would move in an inner loop, so it won't be as painful in the most time-sensitive cases, but it will still result in many unnecessary lookups as it will move everytime that Cling is invoked.

The only thing that would mostly work, would be if Cling provided a callback. "Mostly" b/c I'm not 100% convinced that the Python binding should move along with the change in all cases. As said, any previously JITed code will still refer to the old value, too. That's what I like about, if possible, making del work. That way, the redeclaration becomes explicit in Python, too.

KHIEM2812 commented 1 year ago

I loved all the details u provided. I roughly have some understandings why cppyy binding behaves that way now but I am still doing some further readings, typically about

But first, it is about the point whether ProcessLine() is expensive

image

Did i measure in the correct way?

wlav commented 1 year ago

I don't understand what you are doing there. From the above, it appears that you only loop and measure over True, not over ProcessLine() nor 1+1?

Try this:

import cppyy
import gc
import os
import psutil
import time

N = 10000

# run once for warmup
cppyy.gbl.gInterpreter.ProcessLine("1+1;")

process = psutil.Process(os.getpid())

gc.collect()
last = process.memory_info().rss
t0 = time.perf_counter()

for i in range(N):
    cppyy.gbl.gInterpreter.ProcessLine("1+1;")

print("PL: time: %.1fs" % (time.perf_counter() - t0))

gc.collect()

print("PL: mem increase:", (process.memory_info().rss - last))

gc.collect()
last = process.memory_info().rss
t0 = time.perf_counter()

for i in range(N):
    1+1

print("Py: time: %.1fs" % (time.perf_counter() - t0))

gc.collect()

print("Py: mem increase:", (process.memory_info().rss - last))

Now, to do that precise, you'd need to account for the fixed overhead of the Python loop and also for the cppyy overhead, as well as run it a couple of times, but the difference is so huge, none of that will change the conclusion. (And ProcessLine() is so slow, that I'm not going to run that loop multiple times ....) Note also that gInterpreter.Calc() is less involved than is gInterpreter.ProcessLine(), so that will give a bit of a speedup, but it's still slow and still "leaks" transactions (albeit here also a wee bit less).

But even calling these from C++, it's way faster to send the string 1+1 to an embedded Python interpreter to evaluate (PyRun_String) than it is to send that string to Cling (cling::evaluate). It only becomes a different matter if that string is part of a loop that can be JITed, or a function that can be compiled and re-used, then Cling can (and does) outperform Python.

wlav commented 1 year ago

With https://github.com/wlav/CPyCppyy/commit/f5d3d34ce44740e4e3a7ef4e42a33939fc90f967, this code now picks up the new value. Note the del cppyy.gbl.myvar statement that provides the hook. Since it's a complete lookup, it's also fine if the type of the variable changes.

import cppyy

cpp_code = """
int myvar = 100;
"""
cppyy.gbl.gInterpreter.ProcessLine(cpp_code)
print("myvar version1 = ", cppyy.gbl.myvar)

cpp_code = """
int myvar = 200;
"""

del cppyy.gbl.myvar
cppyy.gbl.gInterpreter.ProcessLine(cpp_code)

print("myvar version2 = ", cppyy.gbl.myvar)
KHIEM2812 commented 1 year ago

I did not use timeit correctly in my previous comment. After running again, it is confirmed that ProcessLine is very expensive. On my computer, cppyy.gbl.gInterpreter.ProcessLine("1+1;") takes ~ 4.2ms while python code 1+1 takes only 5ns.

wlav commented 1 year ago

Released with cppyy 2.4.2 and its dependencies.