wlav / cppyy

Other
407 stars 42 forks source link

Python int unexpected cast to C++ double #41

Open edcjones00 opened 2 years ago

edcjones00 commented 2 years ago

I have a Debian 11 system on a PC. I created a miniconda environment with Python 3.9.10, CGAL 5.0.1, cppyy 2.2.0, and gmp 6.2.1.

"CGAL" is The Computational Geometry Algorithms Library. It is written in C++. It contains wrappers for the Gnu multi-precision library (gmp). Class Gmpz does bigints and class Gmpq does big-rationals.

This program:

#! /usr/bin/env python
import cppyy
cppyy.load_library('libgmp')
cppyy.include('/home/xxxxxxxx/miniconda3/envs/CPPYY/include/CGAL/Gmpz.h')
Gmpz = cppyy.gbl.CGAL.Gmpz
big = 10**30
print(Gmpz(big))
print(Gmpz(float(big)))
print(str(big))
print(Gmpz(str(big)))

outputs:

1000000000000000019884624838656
1000000000000000019884624838656
1000000000000000000000000000000
1000000000000000000000000000000

A Python-oriented user might unconsciously assume that Gmpz(10**30) would convert a Python int into a gmp bigint. I didn't notice for a very long time that this was not happening. Fortunately, all the numbers I input into my program are small. The correct thing to do is Gmpz(str(10**30)). cppyy cannot possibly be expected to do this conversion. Among the various constructors for Gmpz, cppyy chooses double which is reasonable.

Good documentation might have prevented my confusion. cppyy needs to have a reference manual. It should contain details about how basic Python objects (int, list, etc) and C++ constructs (ints, vector, etc.) map into each other. At the other extreme of depth, perhaps fork some of the ROOT documentation.

I have been frustrated many times with cppyy's opaqueness. It seems as if there are several layers of magic between Python and CGAL.

wlav commented 2 years ago

Not sure I follow ... yes, automatic conversion works on the principle that the typed code is legal and should be fulfilled, hence the conversion to double (it would fail in C++ b/c expressing 10**30 as an integer literal won't work and declaring an __int128 would be ambiguous in this particular case, given the available overloads, but otherwise it would only be a warning, not an error).

I just tried SWIG and pybind11 and both behave the exact same way: silent conversion to double. As such, I do think this is the generally accepted (and desired) behavior, even as yes, it's ambiguous at best in C++ proper (for this set of overloads).

The only things I can add is that a) when calling into C++ assume it's C++ and do not expect it to be Python, unless b) the binder of said C++ code has provided appropriate pythonizations to resolve corner cases. Have you looked at flatsurf? I do not know whether they solve this particular case, but they do have several pythonizations to deal with corner cases.

Note that cppyy would be a bit more astute to the conversion problem if the constructor had been templated: it would up integer sizes to the largest available C++ one, then fail; never instantiating double. (But once double has been instantiate, it becomes fair game again.)

I'm not really sure what you're asking for in terms of documentation. E.g. considering ROOT, that project incorporates an old (positively ancient) fork of cppyy, so a priori any documentation from there would not be relevant, but also, what little documentation that project does have, refers to the actual cppyy documentation? Specific to conversions, to pick a particular example, this is legal cppyy (with the automatic type deduction following C++17 syntax and the Python -> C++ type conversion being the obvious one):

>>> import cppyy
>>> m = cppyy.gbl.std.map({"one": 1, "two": 2})
>>> type(m)
<class cppyy.gbl.std.map<std::string,int> at 0x282bfe0>
>>> print(m)
{ "one" => 1, "two" => 2 }
>>> 

but neither type deduction nor Python type conversion works in ROOT's fork of cppyy (since, as said, it's ancient). They also have their own additions (some of which I think are terrible such as e.g. implicit using namespace std), among many other differences.

But also, the other types mentioned (int and list to int and std::vector) are pretty straightforward? The only one that is a massive hairball is str and std::string, b/c of the former being unicode. But that, too, will take the code as written as legal and convert as necessary to make that as-written code execute. And yes strings need better documentation, but I'm still working on a disambiguation feature for C strings, so this is in flux.

If you're asking for a detailed debugging output trace of type coercion during overload resolution, that has been asked before, but is not straightforward as resolution happens at different levels: pre-sort in CPyCppyy, implicit conversions in (C/R) Python (some of which are actually blocked by CPyCppyy, such as silent narrowing of float), and implicit conversions in Cling (like any compiler). I could only ever provide tracing for the first part (in CPyCppyy).