Open Andersama opened 2 years ago
I'm replying late because I think it is an interesting topic. Perhaps it's still useful? :smile_cat:
The following requires that you hard-code a particular token type. It builds, but there may be safety errors, so please audit before you use!
//! The following assumes you have a token type called `token_t` that you impl `logos::Token` for and that is repr(C).
use std::{
ffi::{c_int, c_void},
slice, str,
};
// mock up the logos stuff we need
#[allow(non_camel_case_types)]
#[repr(C)]
pub struct token_t;
struct Lexer<T>(std::marker::PhantomData<T>);
impl<T> Lexer<T> {
fn new(#[allow(unused_variables)] input: &str) -> Self {
Lexer(std::marker::PhantomData)
}
}
impl<T> Iterator for Lexer<T> {
type Item = T;
fn next(&mut self) -> Option<Self::Item> {
todo!()
}
}
/// Create a lexer from an input string
///
/// # Safety
/// - The caller is responsible for keeping `input` alive and constant until `my_lexer_free(*mut my_lexer_t)` is called.
/// - `input` must be a valid utf-8 byte sequence with size `input_sz`.
/// - the returned type is not thread-safe
#[no_mangle]
pub unsafe extern "C" fn my_lexer_create(input: *const u8, input_sz: usize) -> *mut c_void {
let input: &'static [u8] = slice::from_raw_parts(input, input_sz);
let input = str::from_utf8_unchecked(input);
let lexer: Lexer<token_t> = Lexer::new(input);
Box::into_raw(Box::new(lexer)) as *mut _
}
#[no_mangle]
pub unsafe extern "C" fn my_lexer_free(lexer: *mut c_void) {
let lexer: Box<Lexer<token_t>> = Box::from_raw(lexer as *mut _);
// This would happen anyway at the end of the scope
drop(lexer)
}
/// Returns true if there was another element.
///
/// There are other ways you could represent `Option<token_t>` in C if you prefer.
#[no_mangle]
pub unsafe extern "C" fn my_lexer_next(lexer: *mut c_void, token: *mut token_t) -> c_int {
let lexer: &'static mut Lexer<token_t> = &mut *(lexer as *mut _);
match lexer.next() {
Some(t) => {
*token = t;
1
}
None => 0,
}
}
Been a while, I might revisit playing around with this library again if I can make sense of the rust, just because it's so nice. I think I tried a rough version of the language I was going to parse and found I was doing better with a handwritten lexer. I can't remember. But in any case, just for simple things I'd definitely love to use this over writing something by hand.
I take it that:
#[repr(C)]
pub struct token_t;
the #[repr(C)]
forces a struct layout like we'd expect in C so that we can use it later in the wrappers?
So repr(C)
means "lay this out how you would lay the equivalent struct out in C, for example the fields in
#[repr(C)]
pub struct MyType {
id: u32,
rest: *mut u8,
}
and
struct my_type_t {
uint32_t id;
char *rest;
}
will have the same alignment, position, with the same padding between them... you can reinterpret one as the other. If you have
pub extern "C" fn foo() -> MyType { .. }
in Rust you can
struct my_type_t foo(void);
/* .. */
struct my_type_t bar = foo();
You can use cbindgen
to generate the header files for you.
I would've just written everything by hand, thanks for the tip.
My projects are normally in c++ or c, not familiar enough with rust, but from what I've read it's apparently possible to export functions that can be called with c. Looking at the codegen cli's output:
I'm not familiar enough with rust's syntax to know what exactly is happening, but I suspect it can be converted to c, or a wrapper around
fn lex
could be exported.