lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.5k stars 158 forks source link

Simplify lpython.py #2274

Open certik opened 1 year ago

certik commented 1 year ago

Right now the file is already quite small (less than 800 lines): https://github.com/lcompilers/lpython/blob/8df052e174adbd789fc7b03f5526e706195b3543/src/runtime/lpython/lpython.py, but every time it has some custom emulation layer, it almost always breaks.

In the past we have already removed two custom layers:

Now there is another breakage regarding pointers and packed structs using numpy arrays (https://github.com/lcompilers/lpython/issues/2267).

We should try to remove as many custom emulation classes as possible, and only use native CPython classes (like we now use "int" to represent unsigned integer). We might need to remove some support for C pointers or pointers in general, because CPython does not have it, and our emulation is too fragile.

Ideally the file should only contain some types, and various conversion functions that can be implemented very robustly and that do not return any "emulated/wrapped" types.

This will make the file much more robust and resilient against future CPython versions.

rebcabin commented 1 year ago

I think the only way to robustly support pointers is a "heavy" API interface, and it's probably not worth the effort. I think, design-wise, we should make beautiful array types and then discourage people from writing C-flavored Python.

certik commented 1 year ago

I think that lpython.py should only contain:

If done right, the decorators would be used in low level code where you need to interface with CPython or C or need to use overloads (not very common). The custom functions like bitnot are very rare, most code would not use it. That leaves the type annotations --- we can later support some more default CPython annotations as well, perhaps the user can create them. Then people would not need import lpython at all. We would just compile regular CPython code that runs in CPython without any non-standard lpython.py library. Perhaps we can support numpy types like int32. For interfacing C, we could just support ctypes directly (not as nice syntax, but we could do it). And pythoncall is just a small decorator, and perhaps there is an even simpler way to do it, for example we can annotate a function (with an empty decorator) and interpret the insides as CPython. This would be optional, the lpython module provides a convenience way, but if you don't want to introduce this dependency into a larger project, we should support a "non-lpython module mode" as well.

What we do not want to do:

Instead, we need to design the subset of CPython in such a way that we can just use native CPython constructs. In particular, if it is an array, we should just provide some function that converts a CPtr to a numpy array, but we should NOT emulate pointer access or pointer arithmetic directly.

Probably we do not want to even support pointers directly like in here: https://github.com/lcompilers/lpython/blob/8df052e174adbd789fc7b03f5526e706195b3543/src/runtime/lpython/lpython.py#L509, because it is returning a ctype "emulation", and that's fragile. Internally in ASR we can keep pointers, I think that's fine.

All we need to do is to provide some way to interact with C, but our current level of API seems "heavy". For example, take this: https://github.com/lcompilers/lpython/blob/8df052e174adbd789fc7b03f5526e706195b3543/integration_tests/structs_02.py#L9-L19, currently we emulate this via ctypes. Rather, the implementation of pointer should be just:

def pointer(x):
    return x

Which in CPython b = pointer(a) increments the refcount of a and both a and b point to the same structure, and so it behaves identically to a pointer. LPython will ensure that the semantic subset that we do is 100% identical. Consequently, you can only take "pointers" of structs, lists, arrays, but not ints, floats.

However perhaps we don't even need a notion of pointers at all, we just need some light API to interact with C, and when C returns a pointer to int, we just need some way to call it, perhaps via a numpy array of ints, of size 1, or something like that, and LPython ensures that there is no overhead.