metacall / core

MetaCall: The ultimate polyglot programming experience.
https://metacall.io
Apache License 2.0
1.56k stars 160 forks source link

Improve rs_port with all of the feedback from rustaceans #56

Open viferga opened 3 years ago

viferga commented 3 years ago

Add docs.rs and documentation. Documentation examples: https://doc.rust-lang.org/stable/rust-by-example/meta/doc.html

Feedback of @In-line and @snejugal from t.me/rustdevs:

call eax: Few suggestions.

  1. Don't use str error types. Use structured errors.
  2. Any should have From/Into implementations for common types, also possibly serde serialize/deserialize
  3. Any is confusing with standard library Any, rename it, pls.
  4. This seems plain wrong, rust strings are not null terminated https://github.com/metacall/core/blob/7d56d8c09d58bb7b769a794ef84c48ab2b156576/source/ports/rs_port/src/lib.rs#L116

SnejUgal: you can write a proc macro that inspects loaded files at compile time and generates all the boilerplate

https://github.com/metacall/core/blob/7d56d8c09d58bb7b769a794ef84c48ab2b156576/source/ports/rs_port/src/lib.rs#L82-L102

all this seems to depend that these C's int, long etc are always equivalent to Rust's i32, i64 etc, while this is not true

call eax: Few more suggestions.

Library code can be improved further.

  1. Use RAII pattern
  2. Don't constrain use to specific owned types like String, use references or conversion traits AsRef, etc..
  3. Use cargo fmt, cargo clippy
  4. Add separate metacall-c crate for raw bindings.
  5. Remove unnecessary type annotations, rust can pretty much infer majority of them

For good example of idiomatic ffi you can look at neon, it for example has serde serialization for serializing type to JS Values

You use a lot of imperative code with unnecessary allocations (for loops, vectors, etc..), in Rust you can use iterators with operations like map, filter, for_each, some parts can be reduced to one liners with iterators.

lights0123 commented 3 years ago

I'd add:

should Metacall's char really be a Rust char? I think it would be better suited as a Rust u8, as a Rust char is any valid Unicode codepoint, not just a value 0-255.

In-line commented 3 years ago

@light0123 C char can be two bytes too

lights0123 commented 3 years ago

@In-line whatever it is, the Rust port casts it to a u8 anyways before converting it to a Rust char. And a char is required to be the "Smallest addressable unit of the machine that can contain basic character set", so unless you're talking about some weird ISA that isn't byte-addressable, no. You can be like Windows and try to push the two-byte wchar_t, but that's a completely different type.

viferga commented 3 years ago

@lights0123 you are right, it should be u8.

swarnimarun commented 3 years ago

Alright, I went and did some digging, so will put the details of my findings here,

Short Term

  1. Char in Rust is not c_char compatible, this should be altered, in some form. (We can cast c_char to Rust char but casting Rust char to c_char is lossy)... I am open to ideas, might have to write a custom wrapper type to interface between safe Rust and c_char.

  2. The code at, https://github.com/metacall/core/blob/7d56d8c09d58bb7b769a794ef84c48ab2b156576/source/ports/rs_port/src/lib.rs#L116 should be fine because we are passing string length unless there is a necessity for null termination.

  3. Long in C/C++ and Rust(c_long) in Windows is 32 bit while 64 on Linux which means that it can be potentially breaking in cross-platform over the network. Solution can be moving to long long in C/C++ and (c_longlong) Rust, to guarantee a 64bit type for the most part.

Long Term

  1. Moving the current FFI implementation layer behind a more safe and idiomatic Rust interface. (Likely a simple safe wrapper with macros for handling boilerplate can be a good enough start).

As for the syntactic mishaps and formatting or naming related changes, I am not good at them...

Edit: If the above looks good I can make all the required changes on Rust side.

lights0123 commented 3 years ago

Solution can be moving to long long in C/C++ and (c_longlong) Rust

If you're OK with a full change, why not just copy the way Rust does types: u8 (char, although some ISAs use signed), i16 (int16_t), i32 (int32_t), i64 (int64_t), and usize (size_t)? A language that isn't C or C++ doesn't care what the ISA and OS designers chose to be the C standard types, and many newly designed C libraries are only using exact-sized types anyways. If I'm writing something in e.g. Python, why should long mean 32-bit on Windows and 64-bit on everything else? If I wanted a type that represents a pointer, I would specifically request that.

swarnimarun commented 3 years ago

Not a bad idea, personally in most of my own projects I just use fixed size types as well because how easier they make handling types, cross-platform. But that's up to @viferga, here.

viferga commented 3 years ago

My first idea was to implement fixed types in the whole interface, but once I started to implement different loaders I realized that they do not use fixed types. For example, Ruby ( https://silverhammermba.github.io/emberb/c ):

NUM2CHR() for char (works for unsigned char too) NUM2SHORT() for short NUM2USHORT() for unsigned short NUM2INT() for int NUM2UINT() for unsigned int NUM2LONG() for long NUM2ULONG() for unsigned long NUM2LL() for long long NUM2ULL() for unsigned long long NUM2DBL() for double

Or Python ( https://docs.python.org/3/c-api/long.html ):

long PyLong_AsLong(PyObject obj) long long PyLong_AsLongLong(PyObject obj) size_t PyLong_AsSize_t(PyObject pylong) unsigned long long PyLong_AsUnsignedLongLong(PyObject pylong)

In the other hand, later on I implemented V8 and NodeJS loaders, and they use fixed types:

I am not sure how to handle this, because at some point, it will be a conflict between the types, or overflows. Either if we use fixed types or not. And if we use both at the same time, it can be problematic too because you should see what is the internal representation of a integer in some loader in order to use the same outside when you are using the API, so this is going to be harder to the end user because the idea of MetaCall is to abstract from those details.

I have also implemented a casting module, in order to solve this kind of things. For example, some runtimes represent a integer with a long, and you may call with an int, so it does the casting when calling if possible ( https://github.com/metacall/core/blob/develop/source/reflect/include/reflect/reflect_value_type_cast.h , https://github.com/metacall/core/blob/develop/source/reflect/include/reflect/reflect_value_type_promotion.h and https://github.com/metacall/core/blob/develop/source/reflect/include/reflect/reflect_value_type_demotion.h ).

I am not sure how to handle this but I am opened to suggestions.

swarnimarun commented 3 years ago

I would say using fixed types is better as in the worst case we only have to write shims for matching to the specified types from other languages. While using standard/default types can be problematic as it can have multiple layers of conversion requirements.

Imagine sending a long from a Windows metacall instance to Linux instance... Having the base as fixed sizes at least makes sure we won't have unforeseen issues.

viferga commented 3 years ago

If you send this via network there are much more problems than fixed size of the types. For example endianess. And because of that, there is a serialization plugin based system that allows to implement your own serialization protocol for MetaCall values, including all the introspection capabilitites.

You have the plugins here: https://github.com/metacall/core/tree/develop/source/serials

MetaCall serial is a printf like serialization format, mainly implemented in order to test the system and in order to use it where there is not any other format available or things like that. And RapidJSON serial is a serializer which uses this library: https://rapidjson.org/

Another point against fixed types is that without them, compiler can figure out better what sizes are better for some types when compiling.

One way to solve this, at least in V8 and NodeJS loaders (apart from Rust port) is to check the size of int, long, etc and return a fixed type which fits in them.

Another problem may be the sign, which is not defined in MetaCall values. This should be handled and also provide proper error handling in case overflow (not by means of exceptions but classic ennumeration style error (getlasterror() or similar)).

viferga commented 1 year ago

https://github.com/metacall/core/issues/57