windelbouwman / ppci

A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python
https://ppci.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
336 stars 36 forks source link

Python and C3 #91

Open darleybarreto opened 4 years ago

darleybarreto commented 4 years ago

First I would like to thank everyone who has contributed to this work, it is really impressive. I wonder if there's a way of linking a python function in a c3 program. I was trying to do this on IR level, but it seems it doesn't work. For example:

def concat(s1: str, s2:str) -> str:
    return s1 + s2
import io;

public function void main(){
    var string res = concat("I","am");
    io.println(res);
}

I want to play around making subset of python code and compiling it to interact with the system, which C3 makes possible.

windelbouwman commented 4 years ago

Thanks for your interest in this project!

There is several options here. First of, you could compile python to IR-code, and C3 to IR-code and then link this IR-code, using ir_link. Next, you could take this IR code and translate it to machine code.

Another option would be to compile the C3 code to machine code and dynamically link it into the running python process. This allows to specify python callbacks as well.

Things to note here, are the usage of str will not work. This is not implemented yet, only float and int will work when calling python functions. str will require some extra attention to convert the string from python str to a pointer to chars or something alike like a pascal like string.

I hope this helps a bit?

darleybarreto commented 4 years ago

I see, I didn't know the ir_link function. How difficult (in terms of changes and new code) would be work with the ctypes (or cffi) to use strings from and to python?

windelbouwman commented 4 years ago

There are several issues with str:

A good place to start might be this part of the code --> https://github.com/windelbouwman/ppci/blob/master/ppci/utils/codepage.py#L108

pfalcon commented 4 years ago

I'd suggest to skip C3 altogether and use C instead. As @windelbouwman suggests, sticking to int for starters may be a good idea (I'm glad to hear that float works too, but I wouldn't run to test it right away ;-) ).

I myself interested to do more hacking on Python subset compiler, and what I would do is: implement print_int(x: int) function, as without output, hacking on this stuff is indeed not rewarding. My next idea was to make for loop compliant to Python semantics (assignment to loop control variable doesn't affect looping).

Sadly, I'm too full of ideas and too short of time. So, @darleybarreto, please see if ideas above resonate with you, and if so, feel free to beat be on that.

use strings from and to python?

Manipulating string generally means GC. Surely, trivial operations on static strings (like passing them to print()) can be implemented ahead of that. I'd personally still start with printing int's.

darleybarreto commented 4 years ago

I was playing with @pfalcon's picompile for a couple of hours and I managed to get some basic str compilation down to LLVM using llvmlite's ir.Constant(ir.ArrayType). I imagine that the fast way would be making something working with ctypes/cffi (no GC here) for simple operations like passing to functions (e.g. to print), casting, and concatenating.

pfalcon commented 4 years ago

Hey, cool! Except it's not mine, but a humble fork of https://github.com/sdiehl/numpile, to which I didn't yet even apply any interesting changes. First would be getting rid of intermediate AST, and instead do type inferencing and other processing on real Python AST, because this intermediate AST makes toy processing, like done by numpile, easier, but only complicates further extension.

But if you think about type inference, you immediately think about hilarious case of Shedskin, which can't grok the following:

a = 1
a = "str"

Obviously, there's nothing wrong with the above, and type of a isn't int | str either. It's just first a has type of int, while second a - str. That leads us to https://github.com/pfalcon/python-ast-hacking-challenges/issues/2 . And well, as soon one touches SSA, there're enough rabbit holes to follow, for example, in my mind, I'm doing register allocation on SSA (instead of converting Python source to it) :-D.

for a couple of hours and I managed to get some basic str compilation down to LLVM using llvmlite's ir.Constant(ir.ArrayType)

Well, cool, you've got some experience with LLVM API. But you can't get around need for garbage collection when dealing with strings, it's not a value type. So, as soon as you get to:

a = "foo"
b = "bar"
c = a + b

- - you'll need to deal with it. Bu otherwise yes, a nice easy start.

I imagine that the fast way would be making something working with ctypes/cffi (no GC here) for simple operations like passing to functions (e.g. to print), casting, and concatenating.

"Fast way" in which sense? It won't lead to fast or unbloated code, and would be a chore to code up, given that it's largely a throw-away (YMMV) exercise (at least for this usecase). Unless it's already coded up, and it seems that PPCI already supports it, even without ctypes/cffi being exposed (I guess ppci/utils/codepage.py#L108 quoted by @windelbouwman is the underlying impl): https://ppci.readthedocs.io/en/latest/howto/jitting.html#calling-python-functions-from-native-code ;-)

darleybarreto commented 4 years ago

I'm doing register allocation on SSA (instead of converting Python source to it) :-D.

Interesting, although I have to say I don't know much of SSA stuff.

But you can't get around need for garbage collection when dealing with strings, it's not a value type.

I thought on loading and linking .so's of malloc and realloc for loading strings and concatenation.

"Fast way" in which sense?

In the sense of getting something working.

pfalcon commented 4 years ago

In the sense of getting something working.

Makes sense, please keep us posted of your progress!

darleybarreto commented 4 years ago

So, instead of dealing with ctypes, what if we use an independent rust shared lib with #[no_std]? I made a simple working example where one could load a perfect utf-8 string to rust and receive it back:

from cffi import FFI
ffi = FFI()
ffi.cdef("""
    char * load_str(char *);
    void free_str(char *);
""")
p_str = "I will go down to the rabbit hole!".encode("utf-8")
size = len(p_str)
lib = ffi.dlopen("path/to/rustlib.so")
pointer = lib.load_str(p)
p_str_2 = ffi.buffer(pointer,size)[:].decode("utf-8") # a perfect string
lib.free_str(pointer) # freeing 

In this example, the load_str is

use core::{ptr,str};
use ustr::Ustr;
use cstr_core::{CString, CStr, c_char};

#[no_mangle]
pub extern "C" fn load_str(d_ptr: *mut c_char) -> *mut c_char {
    // Based on https://bheisler.github.io/post/calling-rust-in-python/

    if d_ptr.is_null() {
        return ptr::null_mut();
    }
    let data = unsafe { CStr::from_ptr(d_ptr).to_bytes() };

    match str::from_utf8(data){
        Ok(data_str) => {
            let gc_string = Ustr::from(data_str);
            CString::new(gc_string.as_str()).unwrap().into_raw()
        },
        Err(e) => {
            println!("Error while converting raw pointer back to str: {}", e);
            ptr::null_mut()
        },
    }
}

The lib in question would do the whole work as a simple wrapper to rust's String capabilities, also we could use things like ustr which enables caching and fun stuff such as concurrency safety.