Open darleybarreto opened 4 years ago
Thanks for your interest in this project!
There is several options here. First of, you could compile python to IR-code, and C3 to IR-code and then link this IR-code, using ir_link
. Next, you could take this IR code and translate it to machine code.
Another option would be to compile the C3 code to machine code and dynamically link it into the running python process. This allows to specify python callbacks as well.
Things to note here, are the usage of str
will not work. This is not implemented yet, only float
and int
will work when calling python functions. str
will require some extra attention to convert the string from python str
to a pointer to chars or something alike like a pascal like string.
I hope this helps a bit?
I see, I didn't know the ir_link
function. How difficult (in terms of changes and new code) would be work with the ctypes (or cffi) to use strings from and to python?
There are several issues with str
:
str
type this is not implemented. This might be tricky part. First thing which has to be decided is how to represent strings in memory. Probably the same way as either C (a char* to a 0 terminated buffer), or the C3 way, a pointer to a struct with a length integer and a char buffer.str
as a char *
. For this, it would be wise to first try link and load C code instead of C3, since C is more widely used, and answers can be found online.str
is unicode, whereas C3 and C use ascii only. Solution here is to go for ascii for now?A good place to start might be this part of the code --> https://github.com/windelbouwman/ppci/blob/master/ppci/utils/codepage.py#L108
I'd suggest to skip C3 altogether and use C instead. As @windelbouwman suggests, sticking to int
for starters may be a good idea (I'm glad to hear that float
works too, but I wouldn't run to test it right away ;-) ).
I myself interested to do more hacking on Python subset compiler, and what I would do is: implement print_int(x: int)
function, as without output, hacking on this stuff is indeed not rewarding. My next idea was to make for
loop compliant to Python semantics (assignment to loop control variable doesn't affect looping).
Sadly, I'm too full of ideas and too short of time. So, @darleybarreto, please see if ideas above resonate with you, and if so, feel free to beat be on that.
use strings from and to python?
Manipulating string generally means GC. Surely, trivial operations on static strings (like passing them to print()
) can be implemented ahead of that. I'd personally still start with printing int's.
I was playing with @pfalcon's picompile for a couple of hours and I managed to get some basic str
compilation down to LLVM using llvmlite's ir.Constant(ir.ArrayType)
. I imagine that the fast way would be making something working with ctypes/cffi (no GC here) for simple operations like passing to functions (e.g. to print), casting, and concatenating.
Hey, cool! Except it's not mine, but a humble fork of https://github.com/sdiehl/numpile, to which I didn't yet even apply any interesting changes. First would be getting rid of intermediate AST, and instead do type inferencing and other processing on real Python AST, because this intermediate AST makes toy processing, like done by numpile, easier, but only complicates further extension.
But if you think about type inference, you immediately think about hilarious case of Shedskin, which can't grok the following:
a = 1
a = "str"
Obviously, there's nothing wrong with the above, and type of a
isn't int | str
either. It's just first a
has type of int
, while second a
- str
. That leads us to https://github.com/pfalcon/python-ast-hacking-challenges/issues/2 . And well, as soon one touches SSA, there're enough rabbit holes to follow, for example, in my mind, I'm doing register allocation on SSA (instead of converting Python source to it) :-D.
for a couple of hours and I managed to get some basic str compilation down to LLVM using llvmlite's ir.Constant(ir.ArrayType)
Well, cool, you've got some experience with LLVM API. But you can't get around need for garbage collection when dealing with strings, it's not a value type. So, as soon as you get to:
a = "foo"
b = "bar"
c = a + b
-
- you'll need to deal with it. Bu otherwise yes, a nice easy start.
I imagine that the fast way would be making something working with ctypes/cffi (no GC here) for simple operations like passing to functions (e.g. to print), casting, and concatenating.
"Fast way" in which sense? It won't lead to fast or unbloated code, and would be a chore to code up, given that it's largely a throw-away (YMMV) exercise (at least for this usecase). Unless it's already coded up, and it seems that PPCI already supports it, even without ctypes/cffi being exposed (I guess ppci/utils/codepage.py#L108 quoted by @windelbouwman is the underlying impl): https://ppci.readthedocs.io/en/latest/howto/jitting.html#calling-python-functions-from-native-code ;-)
I'm doing register allocation on SSA (instead of converting Python source to it) :-D.
Interesting, although I have to say I don't know much of SSA stuff.
But you can't get around need for garbage collection when dealing with strings, it's not a value type.
I thought on loading and linking .so
's of malloc
and realloc
for loading strings and concatenation.
"Fast way" in which sense?
In the sense of getting something working.
In the sense of getting something working.
Makes sense, please keep us posted of your progress!
So, instead of dealing with ctypes, what if we use an independent rust shared lib with #[no_std]
? I made a simple working example where one could load a perfect utf-8 string to rust and receive it back:
from cffi import FFI
ffi = FFI()
ffi.cdef("""
char * load_str(char *);
void free_str(char *);
""")
p_str = "I will go down to the rabbit hole!".encode("utf-8")
size = len(p_str)
lib = ffi.dlopen("path/to/rustlib.so")
pointer = lib.load_str(p)
p_str_2 = ffi.buffer(pointer,size)[:].decode("utf-8") # a perfect string
lib.free_str(pointer) # freeing
In this example, the load_str
is
use core::{ptr,str};
use ustr::Ustr;
use cstr_core::{CString, CStr, c_char};
#[no_mangle]
pub extern "C" fn load_str(d_ptr: *mut c_char) -> *mut c_char {
// Based on https://bheisler.github.io/post/calling-rust-in-python/
if d_ptr.is_null() {
return ptr::null_mut();
}
let data = unsafe { CStr::from_ptr(d_ptr).to_bytes() };
match str::from_utf8(data){
Ok(data_str) => {
let gc_string = Ustr::from(data_str);
CString::new(gc_string.as_str()).unwrap().into_raw()
},
Err(e) => {
println!("Error while converting raw pointer back to str: {}", e);
ptr::null_mut()
},
}
}
The lib in question would do the whole work as a simple wrapper to rust's String
capabilities, also we could use things like ustr which enables caching and fun stuff such as concurrency safety.
First I would like to thank everyone who has contributed to this work, it is really impressive. I wonder if there's a way of linking a python function in a c3 program. I was trying to do this on IR level, but it seems it doesn't work. For example:
I want to play around making subset of python code and compiling it to interact with the system, which C3 makes possible.