mthom / scryer-prolog

A modern Prolog implementation written mostly in Rust.
BSD 3-Clause "New" or "Revised" License
2k stars 117 forks source link

ISSUE-2464: Make scryer prolog usable as a shared library for other programs #2464

Open jjtolton opened 1 month ago

jjtolton commented 1 month ago

I'd like to be able to use scryer as a shared library. Taking inspiration from the WASM code, I imagined something like

eval_code_c and free_c_string.

Note: I know absolutely nothing about rust, but from some hacking I have something that does... something.

#[cfg(not(target_arch = "wasm32"))]
use std::ffi::{c_char, CStr, CString};

#[cfg(not(target_arch = "wasm32"))]
#[no_mangle]
pub extern "C" fn eval_code_c(input: *const c_char) -> *mut c_char {
    use machine::mock_wam::*;

    let c_str = unsafe {
        assert!(!input.is_null());
        CStr::from_ptr(input)
    };
    let r_str = c_str.to_str().expect("Not a valid UTF-8 string");

    let mut wam = Machine::with_test_streams();
    let bytes = wam.test_load_string(r_str);
    let result_str = String::from_utf8_lossy(&bytes).to_string();

    let c_string = CString::new(result_str).expect("Failed to convert to CString");
    c_string.into_raw() // convert to raw pointer and prevent Rust from cleaning it up
}
#[no_mangle]
pub extern "C" fn free_c_string(ptr: *mut c_char) {
    unsafe {
        if ptr.is_null() {
            return;
        }
        let _ = CString::from_raw(ptr);
    };
}

Here is some test Python I'm using to work it:

import ctypes

lib = ctypes.cdll.LoadLibrary("/home/jay/programs/scryer-prolog/target/debug/libscryer_prolog.so")

lib.eval_code_c.argtypes = (ctypes.c_char_p,)
lib.eval_code_c.restype = ctypes.POINTER(ctypes.c_char)

lib.free_c_string.argtypes = (ctypes.c_char_p,)
lib.free_c_string.restype = None

def eval_and_free(s):
    res_ptr = lib.eval_code_c(s.encode('utf-8'))
    res = ctypes.cast(res_ptr, ctypes.c_char_p).value.decode()
    lib.free_c_string(res_ptr)
    return res

if __name__ == '__main__':
    print(eval_and_free("?- char_type(A, ascii_punctuation).")) #  console display: % Warning: singleton variables A at line 0 of user
    print(eval_and_free("?- char_type('\"', ascii_punctuation).")) # console display is empty, `res` above is an empty string

I apologize for insulting anyone with my barbarian prolog and rust, pointers (no pun intended) would be appreciated.

jjtolton commented 1 month ago

One other bit of criticism --

I copied this loop in run_query_generator() nearly verbatim from run_query() -- I just force it to break at the end of the loop, and I left it in the loop even though it's silly because it was working and I was concerned I would make a nuanced mistake in the logic if I changed the code.

However, this would be rather brittle, since any change to the loop in run_query() would cause the logic to diverge in run_query_generator(), and vice-versa, so perhaps it would make more sense to factor out the commonality in an auxiliary function than then reuse it between run_query_generator() and run_query().

jjtolton commented 1 month ago

One more question for anyone for more experienced rust specialists. I'm using a web development pattern here for a lot of these binding functions to return "ok/error/panic" via a string, i.e., https://github.com/mthom/scryer-prolog/pull/2465/files#diff-b1a35a68f14e696205874893c07fd24fdb88882b47c23cc0e0c80a30c7d53759R260-R280.

Is this typical for rust / c interop code or is that considered overkill?

bakaq commented 1 month ago

[...] so perhaps it would make more sense to factor out the commonality in an auxiliary function than then reuse it between run_query_generator() and run_query().

I think run_query() should be implemented in terms of run_query_generator(). In Rust terms, run_query_generator() is the next() of an Iterator, and the result of run_query() should just be that iterator .collect()ed.

Or, even better, make run_query() return the iterator and leave to the user if they want to collect or iterate it. But that would be a breaking change, so it's probably best to make a run_query_iter() method that returns an iterator, and then the implementation of run_query() could literally just be run_query_iter().collect(). I talked about this in https://github.com/mthom/scryer-prolog/pull/1880#issuecomment-1765078753 and https://github.com/mthom/scryer-prolog/pull/1880#issuecomment-1791799843.

Is this typical for rust / c interop code or is that considered overkill?

This is very atypical for both Rust and C. In Rust you would usually use an enum like Result<T,E> to deal with errors, or some custom one like this:

enum QueryResult {
    Ok(QueryAnswer),
    Err(QueryError),
    Panic,
}

In C you would usually indicate the error with a return value (like a NULL pointer or a non 0 int), and then based on that the caller should react in different ways. Examples of ways to do that:


// On success returns the result and sets error_msg to NULL.
// On error returns NULL and makes error_msg point to the error message.
char *function1(int arg1, int arg2, char **error_msg);

// On success returns 0, makes result point to the result and sets error_msg to NULL.
// On error returns a non-zero value that indicates what kind of error occurred,
// sets result to NULL and makes error_msg point to the error message.
int function2(int arg1, int arg2, char **result, char **error_msg);

// Because in function2 only one of result and error_msg are ever set, we can
// use just one out pointer to represent both, but this can be a bit confusing so
// needs to be properly documented.
// On success returns 0 and makes result point to the result.
// On error returns a non-zero value that indicates what kind of error occurred
// and makes result point to the error message.
int function3(int arg1, int arg2, char **result);

// Usage
int main() {
    char *error_msg1 = NULL;
    char *result1 = function1(1, 2, &error_msg1);
    if (ret1 == NULL) {
       // Now result1 is NULL and error_msg1 is the error message.
    } else {
       // Now result1 is the result and error_msg1 is still NULL.
    }

    char *error_msg2 = NULL;
    char *result2 = NULL;
    int err2 = function2(1, 2, &result2, &error_msg2);
    if (err2 != 0) {
       // Now result2 is NULL and error_msg2 is the error message.
       // Can do a switch in the value of err2 to check what kind of 
       // error happened.
    } else {
       // Now result2 is the result and error_msg2 is still NULL.
    }

    char *result3 = NULL;
    int err3 = functio3(1, 2, &result3);
    if (err3 != 0) {
       // Now result3 is the error message.
       // Can do a switch in the value of err3 to check what kind of 
       // error happened.
    } else {
       // Now result3 is the result.
    }
}

In low level land where C lives it's in general very inconvenient to have to parse something like JSON just to get the results or even know if an error happened, because it involves third party libraries and probably a lot of allocation and memory management. It's a bit more acceptable in Rust because of the great ecosystem, but using better type-level representations is much preferred.