tonybaloney / CSnakes

Embed Python in .NET
https://tonybaloney.github.io/CSnakes/
MIT License
343 stars 24 forks source link

Stop "inventing" CPython C-API calls. What is PyObject ? URGENT !! ... #310

Closed minesworld closed 3 weeks ago

minesworld commented 3 weeks ago

Stop "inventing" CPython C-API calls

the CPython CAPI convention is that calls begin with Py or Py[Type]

To make it clear if a call is made into CPythons CAPI those prefixes should be only used for those calls.

CSnakes.Runtime instead defines calls itself, like

CSnakes.Runtime.CPython.CPythonAPI.PyBytes_FromByteSpan as a wrapper for PyBytes_FromStringAndSize

even worse, as CSnakes defines an C# object wrapper PyObject . And there are calls like

CSnakes.Runtime.CPython.CPythonAPI.PyBytes_AsByteArray which take that C# PyObject instead a CPython CAPI pointer to the PyObject C struct.

Of course there is no PyBytes_AsByteArray within CPythons C-CAPI.

Lets look at a convinience function from CSnakes.Runtime.CPython.Bytes.cs implements in CPythonAPI

    public static bool IsBytes(PyObject p)
    {
        return PyObject_IsInstance(p, PyBytesType);
    }

and CSnakes.Runtime.CPython.Object.cs implements in CPythonAPI

    internal static bool PyObject_IsInstance(PyObject ob, IntPtr type)
    {
        int result = PyObject_IsInstance_(ob, type);
        if (result == -1)
        {
            throw PyObject.ThrowPythonExceptionAsClrException();
        }
        return result == 1;
    }

Two problems with that:

Another "ugly" thing is the usage of C# namespaces. As

there should be a clear line between both, Runtime.Python and Runtime.CPythonAPI - Runtime.Python.PyObject should NOT be used from within Runtime.CPythonAPI .

Instead Runtime.CPythonAPI should ONLY either use IntPtr or nint as references to CPythons PyObject C struct.

What is PyObject ?

There is a saying:

A PyObjectis a PyObjectis a PyObject

Is this really true for CSnakes ? Lets make the test:

A

So in CSnakes

... ?!?

To my (limited) knowlegde there is only one dotnet Python project which could name its Python base class object PyObject: IronPython as it is a native dotnet implementation of Python.

All dotnet projects "wrapping" an underlying non CLR python runtime should NOT name its C# Python object base as the runtimes base name. Period.

Otherwise it would be confusing. And CSnakes is...

URGENT !!

CSnakes needs

to solve the design and resulting usage problems described above.

naming convention

Technically current Runtime.Python.PyObject is a C# object wrapping an IntPtr. So the technically correct naming would be PyObjectWrapper. Which isn't that "beautifull"...

Naming that PyObjectWrapper CPythonObject would refer technically to the CPython runtime implementation but is ambiguous from a language use view as a C# object isn't a C object.

So calling the technically correct PyObjectWrapper as PythonObject is correct from both the technically and language view points.

Refactor the CSnakes.Runtime.Python.PyObject base class to CSnakes.Runtime.Python.PythonObject and all other Python.PyXXX classes accordingly.

The use of Py_ and PyXXX name prefixes is EXCLUSIVE to CPython C-API and such the direct DLL calls.

But from that restriction some problems arise. Lets take a look at CSnakes.Runtime.CPython.Bytes.cs 👍

    internal static nint PyBytes_FromByteSpan(ReadOnlySpan<byte> bytes)
    {
        fixed (byte* b = bytes)
        {
            return PyBytes_FromStringAndSize(b, bytes.Length);
        }
    }

PyBytes_FromByteSpan describes what it does, but break the above rule. Even more the CPython maintainers are free to define a function with the same name... .

Leaving the underscore away, such renaming PyBytes_FromByteSpan to PyBytesFromByteSpan would minimize the risk of CPython maintainers of making that an public C-API call.

But the only "real" solution is to rename in a technically correct way by using human language semantics. Here the forbidden PyBytes_FromByteSpan would become an also technically correct ByteSpanToPyBytes .

Refactor CSnakes that naming respects the protected CPython C-CAPI names without being ambiguous about the nature of the used objects.

The current definition of PyObject_IsInstance(PyObject ob, IntPtr type) within the CSnakes.Runtime.CPython.CPythonAPI class breaks the PyXXX prefix rule, might throw exceptions and has a parameter of type PyObjectwhich is a C# wrapper. Also CSnakes.Runtime.CPython doesn't include the PyObject class, it is defined in the CSnakes.Runtime.Python namespace.

functions and function parameters should only used types defined within that namespace or sub-namespaces

The callees of CSnakes.Runtime.CPython.CPythonAPI PyObject_IsInstance(PyObject ob, IntPtr type) are all within the CSnakes.Runtime.CPython namespace, all in functions defined like IsBytes(PyObject p) . This usage pattern shows that PyObject_IsInstance(PyObject ob, IntPtr type) should be named IsInstance(PyObject ob, IntPtr type). But still - wrong namespace, it should be in CSnakes.Runtime.Python as the calling functions like IsBytes(PyObject p) .

Refactor CSnakes that functions reside wihtin the correct namespace, the one which defines the types of parameters

inter API call convention

CSnakes design implies a top-down functional usage hierachy as:

As such CSnakes.Runtime.Python might use CSnakes.Runtime.CPython, but should not the other way around. This way each namespace should be able to run without dependencies on upper ones. Also bottom-top calls make it tempting to define functions in the wrong namespace.

only top-down functional type definitions: like CSnakes.Runtime.Python can use CSnakes.Runtime.CPython, not the other way only top-down functional calling:: like CSnakes.Runtime.Python can use CSnakes.Runtime.CPython, not the other way

Refactor CSnakes to only use top-down functional type definitions and calls

C# namespace usage convention

CSnakes usage hierachy correspondends to a "worry" usage hierachy from its users (developer) point of view. From hard to easy:

As such

functions defined CSnakes.Runtime.CPython should not manage the GIL for usage within such a function CSnakes.Runtime.Python code manages the GIL and lifetime of objects

Lets take a look at CSnakes.Runtime.Python.PyObject.cs which defines for PyObject

    internal static Exception ThrowPythonExceptionAsClrException(string? message = null)
    {
        using (GIL.Acquire())
        {
            if (!CPythonAPI.PyErr_Occurred())
            {
                return new InvalidDataException("An error occurred in Python, but no exception was set.");
            }
            CPythonAPI.PyErr_Fetch(out nint excType, out nint excValue, out nint excTraceback);

            if (excType == 0)
            {
                return new InvalidDataException("An error occurred in Python, but no exception was set.");
            }

            using var pyExceptionType = Create(excType);
            PyObject? pyExceptionTraceback = excTraceback == IntPtr.Zero ? null : new PyObject(excTraceback);
            PyObject? pyException = excValue == IntPtr.Zero ? null : Create(excValue);

            // TODO: Consider adding __qualname__ as well for module exceptions that aren't builtins
            var pyExceptionTypeStr = pyExceptionType.GetAttr("__name__").ToString();
            CPythonAPI.PyErr_Clear();

            if (string.IsNullOrEmpty(message))
            {
                return new PythonInvocationException(pyExceptionTypeStr, pyException, pyExceptionTraceback);
            }

            return new PythonInvocationException(pyExceptionTypeStr, pyException, pyExceptionTraceback, message);
        }
    }

Where and where is this function used? From

Correct is that ThrowPythonExceptionAsClrException manages the GIL as its defined within the CSnakes.Runtime.Python namespace.

Wrong is the name of the function as C# naming and usage pattern is throw new Exception(); . So the correct name would be PythonExceptionAsClrException .

But even then - from the CSnakes.Runtime.CPython is a bottom-top call.

One might argue that PyObject.ThrowPythonExceptionAsClrException is an exception and such might break rules. But that doesn't need to be the case.

ThrowPythonExceptionAsClrException does three things:

This can be spitted those two parts, where

Given that it is PythonEnvironment of CSnakes.Runtime has top-down access to both CSnakes.Runtime.CPython and `CSnakes.Runtime.Python namespace usage and creates the CSnakes.Runtime.CPython.CPythonAPI object, the correct implementation would be something like:

namespace CSnakes.Runtime.CPython;

class CPythonAPI 
{
  protected Func<Exception> CreateExceptionWrappingPyErr { get; init; } = CreatePlainExceptionWrappingPyErr;

  static public bool FetchAndClearPyErr(out nint excType, out nint excValue, out nint excTraceback) 
  {
    excType = excValue = excTraceback = 0;

    if (!CPythonAPI.PyErr_Occurred())
    {
        return false;
    }
    CPythonAPI.PyErr_Fetch(out excType, out excValue, out excTraceback); // Deprecated since version 3.12 ... might be replaced in the future

    if (excType == 0)
    {
        // is it sure to that excValue and excTraceback are null?
        return false;
    }

    CPythonAPI.PyErr_Clear();

    return true;
  }

  static public Exception CreatePlainExceptionWrappingPyErr()
  {
     if (FetchAndClearPyErr(out nint excType, out nint excValue, out nint excTraceback) == false)
     {
       return new InvalidDataException("An error occurred in Python, but no exception was set.");
     }

    var exception = new Exception( /* create a meaning full text based on excType */  );

    Py_DecRef(excType);
    if (excValue != 0)  Py_DecRef(excValue);;
    if (excValue != 0)  Py_DecRef(excTraceback);

    return exception;
  }
}
namespace CSnakes.Runtime.Python;

class PyObject
{
   static public Exception CreatePythonExceptionWrappingPyErr()
   {
     using (GIL.Acquire())
     {
       if (FetchAndClearPyErr(out nint excType, out nint excValue, out nint excTraceback) == false)
       {
         return new InvalidDataException("An error occurred in Python, but no exception was set.");
       }

       using var pyExceptionType = Create(excType);
       PyObject? pyExceptionTraceback = excTraceback == IntPtr.Zero ? null : new PyObject(excTraceback);
       PyObject? pyException = excValue == IntPtr.Zero ? null : Create(excValue);

       // TODO: Consider adding __qualname__ as well for module exceptions that aren't builtins
       var pyExceptionTypeStr = pyExceptionType.GetAttr("__name__").ToString();

       return string.IsNullOrEmpty(message) 
         ? new PythonInvocationException(pyExceptionTypeStr, pyException, pyExceptionTraceback)
         :  new PythonInvocationException(pyExceptionTypeStr, pyException, pyExceptionTraceback, message);
    } 
  }
}

so PythonEngine of CSnakes.Runtime can create CSnakes.Runtime.CPython.CPythonAPI object like

new CPython.CPythonAPI(...) {  CreateExceptionWrappingPyErr = Python.CreatePythonExceptionWrappingPyErr };

and throwing an error within the CSnakes.Runtime.CPython is done via

throw CPythonAPI.CreateExceptionWrappingPyErr();

where as from CSnakes.Runtime.Python its

throw CreateExceptionWrappingPyErr();

(Note: code above doesn't work with passing the optional message parameter yet....)

Refactor CSnakes.Runtime.Python.PyObject.ThrowPythonExceptionAsClrException using the correct name and namespaces**

The PythonInvocationException is a subclass of Exceptionand is located in the CSnakes.Runtime namespace. As of now its only used from CSnakes.Runtime.Python and indirectly by CSnakes.Runtime.CPython. With the fixes applied its only called from CSnakes.Runtime.Python and should moved into that namespace.

Refactor PythonInvocationException moving it into the CSnakes.Runtime.Python namespace

There exists a CPythonAPI class within the CSnakes.Runtime.CPython namespace. CSnakes.Runtime.CPython.CPythonAPI is somehow redundant. CAPI is sufficient.

Refactor CPythonAPI of CSnakes.Runtime.CPython to CAPI

Why the urgency?

Simple: each day passing and such possible commits done might add sources to be refactored. At some point that work would be such huge that it

Even if CSnakes states

_Warning

This project is in prototype stage and the API is subject to change._

it looks like from the comments and issus the people are using it in real production projects. And thanks to CSnakes having a NuGet package and Tonys promotion the user base might grow rapid.

For the NuGet package uses which are restricted to CSnakes.Runtime.Python those changes should be non-breaking. But there are users of the github sources to have access to CSnakes.Runtime.CPython (like me). And for those changes are breaking...

Additional thoughts

While pythonnet project has a different scope, it sources show some interesting concepts.

Like instead of having the CPythons C-API calls using nintor IntPtr special types are defined:

Those are defined as C# structs containing a Pointer of type IntPtr to the CPython C-API C PyObject struct.

The beauty of this is

My guess is when CPython source code maintainers see this, they are envious as the would like to use some C typedefs but: to much work and will break stuff...

pythonnets C# object based wrapper class which enables automatic lifetime is PyObject. Which holds the pointer to the CPython C-API PyObject struct as protected IntPtr rawPtr. As pytonnet

PyObject defines internal BorrowedReference Reference => new (rawPtr);

BorrowedReference defines public static implicit operator BorrowedReference(PyObject pyObject) => pyObject.Reference;

the C-API calls defined like

nint PyObject_Size(BorrowedReference pointer)

can be done using the PyObject as parameters as the conversion is done implicitly. That is really nice as this makes usingliterally CPythons C-API easier in C# then in C which is pythonnets domain. At the cost of creating a new struct for each C-API call...

BTW: has pythonnet a PyList object? Of course. Does is define an Append method? Of course...

Another big difference between pythonnet and CSnakes is where the C-API external functions are stored and how the are called:

The best of both worlds: what to learn from pythonnet for the future of CSnakes

As CSnakes IS going to be THE one-stop solution for Python NuGet packages depending on CSnakes ( people are already asking for specific packages here... https://github.com/tonybaloney/CSnakes/issues/296 ) there will be many NuGet packages. The promise of CSnakes is "just include it", even multiple packages at once.

So the unavoidable will happen

There will be NuGet packages requiring specific versions of Python which can not be used togehter in a single project

As CSnakes exposes only CSnakes.Runtime.Python this is like a "broken promise", a "show stopper" which should be avoided at all costs.

The solution would be that the DLL calls are kept like in pytonnet in a Delegate, but an instance instead of a static class. CSnakes.Runtime.Python classes would not only hold a reference to the C-API Python object but also to that Delegate, e.g. interpreter which created the wrapped object.

If CSnakes.Runtime.Python classes want to call the C-API and use another Runtime.Python instance as an argument, the Delegate have to be the same. Otherwise a "fallback" might be used (like using the dotnet frameworks functions to compare by converting the values from Python to it) or a PythonInvocationExceptionis thrown.

Refactor to make the use of multiple CPython versions in parallel possible

If the existing Python NuGet packages aren't suitable for loading different DLLs at the same time due to different compiler flags or whatever - we should ask the maintainer to fix that. Realiy will look something like: if there is only one fancy "must have" AI package, the downloads of the CSnakes "preferred" Python package will outnumber every other provider by factors...

As a last resort the user of CSnakes could compile all needed Python Version themselves with the same flags, libraries etc. ... For many it will be worth the effort if given a step-by-step documentation or project to do so.

Python being the #1 data scientiest and AI developers language choice there are already many specialized CPython data structures provided as modules. numpy being the best known. But CPython subinterpreters going to get "real" there is a prototypes of Copy-On-Write module which enables some sort of sharing between subinterpreters implemented as a module written in C .

To use some of those special datastructures from C# the "right way" will require using CSnakes.Runtime.CPython down to the level where the layout of the C struct must be known to the C# code.

No problem for the end user - a NuGet package would provide the CPythons C-module and the correspondening C# class. A huge plus point for using CSnakes ...

Refactor to make CSnakes.Runtime.CPython public

As such new C# CSnakes.Runtime.Python based classes will be there wrapping CPython object of specific types, there must be a mechanism to create the C# objects. Also the SourceGeneratormight be more flexible to use those then.

Make converting CPython to C# objects configurable

Make the SourceGeneratorto be extensible to use new C# CSnakes.Runtime.Python subclasses

As of now developers using CSnakes.Runtime.CPython calls would have to manage the lifetime of the CPython objects themselves (Py_IncRef, Py_DecRef). In pythonnet they would use the PyObject base class with takes that burden away, in CSnakes its in the CSnakes.Runtime.Python` namespace.

It is possible to refactor CSnakes in a way that the a SafeHandle based class is usable from CSnakes.Runtime.CPython and its objects given to the CPython C-API calls... . Developers could use those with "using" to write less and memory-leak free low level code.

A clean design would look something like

This way functions using the CSnakes.Runtime.CPython.APIorCSnakes.Runtime.CPython.CAPImust be given theCAPIDelegateinstance and calling the CPython C-API would look likecapiDelegate.PyObject_GetSize(x);. Its up to the caller of such functions to ensure that all used C-API python objects originate from the sameCAPIDelegate` / CPython instance.

Refactor CSnakes.Runtime.CPython to have a CAPI and an API class based interface of which the API enables using the using keyword to let the CLR manage objects lifetime.

Another "sideeffect" benefit would be that existing pyhtonnet code would be adopted more easily. In best case scenarios like replacing pythonnet (wrongly named) PyObject with (correctly named) ReferenceObject ... which would be true for both the developer using CSnakes as developing CSnakes itself.

pythonnet license is MIT, so "borrowing" sources is a "non issue" by referering from where its originating...

The ReferenceObject class should include the C# convinience functions like Equals as CSnakes.Runtime.Python.PyObject does to make code more readable. But methods like GetAttr etc. should be left out as those should be only in the CSnakes.Runtime.Python namespae. Such must be used as PyObject_GetAttr ...

minesworld commented 3 weeks ago

Much work to do - I know. As I've done some parts of it already in my "minesworld" branch here: https://github.com/minesworld/CSnakes/tree/minesworld

This branch already has refactored the CAPI - but my bad that I didn't looked at the pythonnet internals before doing so. So instead having those genious NewReference, BorrowedReference... I did only alias the nintto pyoPtrand created a MPyOPtrsubclass of SafeHandle which also acts as the base class for the Runtime.Python classes...

Refactoring ThrowPythonExceptionAsClrException comes next - came up with that idea while writing the issue above.

My "minesworld" branch is forked from 6bfe4ff5 and I was going to implmenet CSnakes main changes up to current 513fa03, but then thought that the chances having those changes pulled would be greater by quickly filing an "Issue" ...

At the end I spent 12+ hours doing so. Just mentioning to show that I'm serious about the above issues.

@tonybaloney please make a decission in which direction CSnakes should go as incorporating from your main branch will get more difficult with each "major" refactoring done in a branch...

minesworld commented 3 weeks ago

@tonybaloney a personal "note": your github page says:

0.10X developer. Python Advocate at Microsoft. Python Software Foundation Fellow. Fellow of Macquarie University, Sydney

Great for you. And CSnakes too as its get "some attention" by you having those connections and promoting it...

But with stating those connections comes some responsibility. Every repository you put online "falls back" to the mentioned organizations. Don't blame me - it is like it is...

Its beyond my understanding why CSnakes isn't an official Microsoft proejct, hopefully you can work during your paid time on it (which would mean that it is an inofficial MS project until is mature enough to be an official one...) .

Truth is - I have a hard time understanding whats going on at Microsoft. Having switched to Windows after being a postchild Mac fanboy for 25 years. Earned my living adminstrating Macs and developing software...

There were many reasons to switch over the time, but the final straw was I wanted to use the Microsoft Studio (best computer EVER) and developing software wih pen input. But at the Windows AppSDK / WinUI3 they are "thinking" about putting PenInput on the roadmap. Very "funny" ...

At least Apple knows how to sell their own hardware while Microsoft has a Surface line Apple users can only dream of, but there seems no easy way to use those fantastic computers to their full potential. What a shame.

PS: as you might already reach to conclude - I am something like a "programming language fetishist". And briding two different languages with different idioms is hard to be done the right way...

PPS: accessing via brackets - as x = o[a] works on CSnakes.Runtime.Python C# classes the same as on python when an object implements both dict and list interfaces? Too bad Guido didn't think about that when designing Python as there is an easy solution to that problem...

tonybaloney commented 3 weeks ago

As mentioned before, I'm not making major changes to the goals of the project.