Remove PyObject usage from the CPythonAPI, refactor PyObject to WrappedPyObject, Contract based optimizations

Lets set three goals:

CPythons C-API calls should behave from C# as they would from C. Same no exceptions thrown and reference counts in C# code as it would from C. So near 1:1 translation of sources are possible.
when a wrapped CPython object is used, it should be clear to the developer simply by its name. As there is a PyObject in the C-API this name implies that the real object is used and some rules apply... . This is not the case in the current C# code as PyObject is used as the C# wrapper class

Different developers have different needs. As of now CSnakes simply works "out of the nuget package" in all cases but at the cost of overhead. Will be fine this way for those whose code spends most if its time Python code compared to how fast accessing the results is on the C# side. Different story for those who will present the results in a WinUI: its a bad user experience if the CLR GC kicks in while scrolling thru a list - chances that the user will blame the correct one are minimal...

make CSnakes the "go to" solution for every developer

Remove PyObject (as a C# object) usage from the CPythonAPI

This code in the CPythonAPI is simply "wrong" (already refactored to use a Raw interface for the DLL calls):

    internal static nint PyList_GetItem(PyObject obj, nint pos)
    {
        nint item = Raw.PyList_GetItem(obj.ptr, pos);
        if (item == IntPtr.Zero)
        {
            throw PyObject.ThrowPythonExceptionAsClrException();
        }
        Raw.Py_IncRef(item);
        return item;
    }

CPythons C-API PyList_GetItem does not throw an exception and does not call PyInrecRef on its own.... .

While the code does the right thing from a C# developers perspective, to C developers knowing CPythons C-API its "confusing" in two ways:

why is that call there anyway?
isn't PyObject supposed to be the same thing as in C, a Pointer to the PyObject struct?

We will make it clear by having CPythonAPI using only IntPtr, no C# wrappers. Maybe additionaly defining PyObject as a IntPtr so translations of C code need even less typing.

As a C# wrapper is used, this code doesn't belong in the CPythonAPI at all.

Refactor PyObject to WrappedPyObject

There are some Python bridges which have the ability to modify implementations from both sides transparently. In effect a Python object can be an "interesting" mixture of C# and Python calls when used in Python code from the PythomInterpreter... This isn't the goal of CSnakes.

Also a problem arises when there is an overidable C# interface for CPython objects - when provided defintions are changed the outcome might have some unwanted side-effects which might be hard to debug.

The current usage of PyObject is three fold:

its a C# SafeHandle object which makes the CPython object CLR GC usable
when used a such making CPython API calls its ensured that the correct Py_IncRef/Py_DecRef calls are made onto the wrapped CPython object
Exceptions are thrown as a C# developer would expect

Renaming the class from PyObject to WrappedPyObject makes it clear what this class does.

Renaming all functions which take a WrappedPyObject as a parameters to WrappedPyXXX() makes it clear that C# GC and Exception rules apply for all parameters as its result.

So the above PyList_GetItem would be:

    internal static nint WrappedPyList_GetItem(WrappedPyObject obj, nint pos)
    {
        nint item = Raw.PyList_GetItem(obj.ptr, pos);
        if (item == IntPtr.Zero)
        {
            throw PyObject.ThrowPythonExceptionAsClrException();
        }
        Raw.Py_IncRef(item);
        return item;
    }

Contract based optimizations

CSnake can only optimize correctly "automagically" if the developer using its APIs assures that his usage of Python does something NOT.

Such an usage assurance is a Contract from the developer made with CSnakes. If the developer breaks a given contract - neither CSnakes fault nor problem.

If there is no Contract made, CSnakes will always work at the cost of the needed overhead. In case that is to much overhead, the developer can go thru the list of available contracts and made those ones applyable for his code/use case.

The Contract API should be made in a way that a developer willing to go the extra mile could add his own contracts and optimizers so the need to implement is minimized in regard to CSnakes source code changes.

A Contract should fail at creation time if a wrong condition is detectable. Like creating a Contract assuming a Global GIL when the CPython DLL is compiled in different.

Once a Contract is made it should not be checked again if its valid as this would imply overhead. Its not the wrong code path applied by CSnakes - its the wrong Contact made be the developer...

Optimizations enabled by Contracts are never implemented wthinin the CPython-API calls, those given IntPtr/PyObject, but either within WrappedPyObject calls or as utility functions to them. Lets see what makes the most sense.

If a developer uses calls below the CSnakes API where it is assured by CSnakes that Contracts are respected - not CSnakes problem. We will make it clear in the documentation that such a mixed usage might have side-effects and is not advised by CSnakes.

Examples why Contracts are the way to go:

https://github.com/tonybaloney/CSnakes/issues/179 . As in the discussion shown a simple implementation might fail, but if a developer makes a Contract to not modify the list, it will work for him. Such a developer just needs to write a single line of code and might gain "insanly great" speed ups...
https://github.com/tonybaloney/CSnakes/issues/77. As in the dicussion shown there is a way to reduce overhead if during the CPython API call the PyString name is only used for lookups.
https://github.com/tonybaloney/CSnakes/issues/271 . Hopefully just choosing a different nuget CPython package and one line of code to create the contract... trading memory for speed at its best.

These optimization are a piece of cake to implement compared to designing a clean Contract/Optimization API. But it should be worth that effort as CSnakes should become the "go to" option for developers...

Here is a preview of my work in progress: https://github.com/minesworld/CSnakes/commit/73d2caf2380f7299a552de9accc096d4c6a447a8

Refactoring

The new big picture is three layered base classes in the Runtime:

CPython.Unmanaged.CAPI : a 1:1 C-API, each CPython call is left as is (even giving back borrowed refs). A nint is aliased to pyoPtr to show that a CPython Object Ptr is expected or returned. If an error condition could be returned, its nint
CPython.CAPI : a 1:1 C-API whose main class MPyOPtr is a SafeHandle wrapper for pyoPtr . Can be constructed by stealing ... .
Python.PythonObject : renamed from PyObject, based on MPyOPtr . The other classes in CSnakes.Runtime.Python not changed yet...

still "some" refactoring to do - but looks very promising. CSnakes.Tests and CSnakes.Runtime.Tests pass...

These classes must be made public - a developer should be able to write code using:

CPython.Unmanaged.CAPI just a the C sources tell him, fastest code but its up to the developer to match each PyIncRef with a PyDecRef
CPython.CAPI somehow as the plain C, but no need to call PyDecRef as the objects are wrapped in the MPyOPtr SafeHandle. Must be done by "hand", still more readable and often more realiable code than unmanaged. Still Convinience is paid with performance...
Python.PythonObject the most C# like way

When refactoring is done, it should be possible to mix these layers as wanted or needed. Thus someone can begin with the PythonObject layer, benchmark it and go way down where needed...

Context object

Whats missing and "coming soon" - a "genius" but still obvious Context object !!!

We all know CPython context already:

var gilState = CAPI.PyGILState_Ensure();

# some CSnakes.Runtime.Unmanaged.CAPI and/or CSnakes.Runtime.CAPI calls

CAPI.PyGILState_Release(gilState);

As doing something on the GIL or ensuring ThreadState might be expensive, most code tries to avoid that as shown above... But WHAT happens in THIS code?

var gilState = CAPI.PyGILState_Ensure();

# some CSnakes.Runtime.Unmanaged.CAPI and/or CSnakes.Runtime.CAPI calls
# some CSnakes.Runtime.Python usage...

CAPI.PyGILState_Release(gilState);

NOT what the developer intended: NOT calling the GIL. Chances are some

using (GIL.Acquire()) {}

are being called within CSnakes.Runtime.Python ... not good. And the developer can not change that without forking CSnakes .

Context object to the rescue !!!

It manages the GIL. So on instead in CAPI code creating a GIL a Context is created and its GIL aquired. When a PythonObject is created, that Context object is stored there too. So it can check in using (Context.AquireGIL()) {} only if that is really needed...

Can't wait to implement and benchmark it - maybe some LINQ on 10.000 PythonObjects using 5 attributes? Should make a difference NOT calling GIL.Aquire at least 50.000 times...

But - thats just the bare mimimum. What about Multiple Interpreters ?

Someone writing CAPI code will be aware that using CPython objects from multiple interpreters together is a "no go". His fault if doing that...

But: passing wrapped in PythonObject to C# code "somewhere" it might get overseen. So a Context object could also store from WHICH interpreter the CPython object originated. And the C# code could fail gracefully instead CPython producing really nasty bugs...

Optimization Contracts

"Genius"? No - but THIS is: putting Optimization Contracts into the Context !!

First of all - pure C-API calls can't be optimized as those a just calls into the DLL. But there a plenty of non C-API calls in all three layers. Those located in Unmanaged.CAPI and CAPI have a different nameing scheme, so they are distinguishable as Non C-API calls.

These calls will get the Context as an extra parameter, so different code paths could be executed depending on which Contracts are available and applyable.

I expect huge gains from functions which give strings to the C-API. Caching or even better a UTF16-first CPython implementation - near zero overhead will be possible.

In fact, the DLL calls on Unmanaged.CAPI and CAPI are done WITHOUT string marshalling. THAT is done within those special named functions where needed...

BTW: if a developer makes a no multiple interpreter Contract, that checks don't need be done...

Interested or should I rename my fork?

@tonybaloney please take a look at it. Suggestions are welcome... some name changes might be needed. And I'm not an expert in C# class access rights. The goal is that the developer can write his code under whatever namespace he likes using CSnakes.Runtime.CPython ...

PS: if we have a working optimizing prototype - I bet this will attract a lot of developers.

What about a Contract that some class definitions wont change? With a custom CPython DLL it might be possible to cache the lookup for a class and shorten that path. Its like what in Objective-C was possible: looking up the implementation of the call and then calling that directly in a loop... . Beside that: Objective-C had a per class lookup cache anyway, but why destroy that?

We have full control over the sources - on the CSnakes and CPython side. The sky is the limit, we "only" need "crazy developers" wanting to exploit that... And who knows - in a bright future those principles might be used by the CPython core team too. At the moment the type annotations are hints for the IDE editor. But think about modules specifying what contracts they fullfill..

PPS: before someone comes up with "reasons" why that GIL code example might fail and thus can't be used - the solution is called 'Contracts" ...

There's a few reasons why the CPythonAPI wrapper doesn't just map 1:1 to the CPython API and why we don't expose it directly.

The main reason is that the CPython C-API is a mess. It is enormously inconsistent whether it returns a borrowed reference, a stolen reference, a strong reference or a new reference. Sometimes, the reference guarantee changes between CPython versions! For example, fetching frame objects changed from a borrowed to a new reference in Python 3.12.

And sometimes the documentation is wrong.

I don't really want to maintain a low-level CPython API wrapper. That's not the goal of this project either. We're trying to build a safe, friendly API to simply call a Python function and library from a .NET project, without requiring deep knowledge of Python's API.

Working out how to maintain and balance references between a GC/RefCnt and the CLR is hugely complicated and going to result in either memory leaks or segmentation faults unless you really, really know what you're doing.

Should make a difference NOT calling GIL.Aquire at least 50.000 times.

The current GIL context wrapper is already recursive, so if you wrap the GIL once, you can call 50,000 times inside that context and it won't acquire and release the GIL each time anyway.

Holding the GIL too long is a problem and Py_Decref should be called with the GC held so we have special code to queue object disposable to avoid deadlocking.

Python.NET has a lower-level API, have you looked at that implementation as well?

OK, usage of unstable CPython API is a good argument...

What about making the stable API public, the rest protected?

This way a developer can "consume" the stable API in his namespace, if someone really wants to use the unstable, he "extends" the runtime. And by extending the runtime the one who does is responsible. Might be no problem for those who use their fixed version of the CPython DLL ...

The problem is: there should be an easy way to write some "fancy" stuff NOT supplied by your PyObject (PythonObject called by me).

Lets say I want to avoid "mass creation" of strings during dictionary lookups, know the needed keys beforehand. So the "logical" thing would be:

get all keys and hold the references to those needed
besides avoiding GC havoc on the CLR side, the dictionary lookup itself must be faster later this way

using CSnakes.Python API like this:

var key = aPyDictionary.Keys[0]; var fastValue = aPyDictionary[key];

will be key a string or PyUnicode object? Guess it would be PyDictionary<string,somewhat> generated. And PyDictionary<PyUnicode,somewhat> might work for caching, but might have issues on the WinUI side...

So the best place for a "generic" solution is the wrapping function for PyObject_GetItem ...

BTW: PyDictionary is caching via _dictionary? That would create headaches "value wise" for me... besides creating unneeded objects... . Even in pure C# code I can watch the GC heap grow while scrolling virtualized views...

But: it might be a valid performance strategie to apply for others.

C# has AsyncLocal which could be a solution for a ContextManager, implementing an Contract/Optimization-stack on a list itself, thus hopefully without capturing unneeded variables etc. Should be fine using like this:

using(new ContextScope(Use_PyDictionary_dictionary)) { [...] }

var keys = new List { "A", "B", "C" } ; using(new ContextScope(new CacheKeysAsPyObjectsEqual(keys))) { [...] }

So the questions would be:

does this work as intendet if CPython making callbacks into C# from a CPython thread? Thus there a different scope would be used?
what would be the performance hit getting Current of AsyncLocal? Is there a faster way using CPython for that ?

tonybaloney / CSnakes