Closed vsbogd closed 5 months ago
Apologies I haven't yet given this PR a thorough line-by-line review. However based on just a brief glance, I have several concerns.
I guess there are two separate problems in my mind, 1. getting atoms across the "FFI barrier", ie. into and out of the MeTTa engine's internal format from an extension language (C and/or Python), and 2. Representing atoms in a stable way that could be written out to a file or sent over the network.
I am working on the belief that you are mainly trying to solve 1. and if that gets us closer to solving 2. then that's a bonus, but not the main effort. Am I right about that?
If I'm right about that point, my feeling is that the serializer interface would be better to track the structure of atoms more closely than the structure of Rust types. e.g. expr(sym(something), gnd(true))
instead of serde.rs's interface which was built to conform primarily to Rust's data types and more generally the data structures in imperative languages.
We don't necessarily need interoperability with serde.rs
. The v8 project opted to extend serde.rs (https://docs.rs/serde_v8/latest/serde_v8/ ) but JavaScript objects are a closer fit for Rust's structs than MeTTa atoms. That decision could go either way.
The main thing is that the use cases I see are either going to be exchanging small primitives or exchanging grounded atoms that don't have a straightforward representation in terms of primitives. (NN weights, images, etc.)
For the small primitives (bool, some small integers & floats, etc.), it seems like a fast-path that avoids the overhead of serde might be better.
For the big stuff with unknown structure, the main interchange probably needs to be a data-blobs. For reasons of support under a future multi-threaded runner / and a further-future multi-machine deployment we might need to rethink the way these blobs are managed to avoid a bottleneck, but we can push that concern into the future.
I apologize if I have misunderstood some aspects of the design.
I am working on the belief that you are mainly trying to solve 1. and if that gets us closer to solving 2. then that's a bonus, but not the main effort. Am I right about that?
You are right that API in this PR is not intended to be an API to serialize any atom. Its goal to allow serialization of the grounded atoms. Yes, I am trying to solve 1. and allow reusing this API to implement 2. with respect to grounded atoms.
serializer interface would be better to track the structure of atoms more closely than the structure of Rust types. e.g. expr(sym(something), gnd(true)) instead of serde.rs's interface
I agree that for serializing Atom
it would be better API, but serde.rs API suits well serializing grounded values. I tried to explain it in the comment to the https://github.com/trueagi-io/hyperon-experimental/pull/597/files#diff-99de33c2a02e714dd8b5b896c6894afcd531f47842396a32123a4b5a0d6e4cfb module.
The main thing is that the use cases I see are either going to be exchanging small primitives or exchanging grounded atoms that don't have a straightforward representation in terms of primitives. (NN weights, images, etc.)
This is correct.
For the small primitives (bool, some small integers & floats, etc.), it seems like a fast-path that avoids the overhead of serde might be better.
My current understanding is that API to convert small primitives is not very different from serde.rs API. It is difficult to explain without code but I would like to implement a "converting serializer" in Python and in Rust which gets primitive value (through serde.rs API) and just writes it to the reference passed by client. Thus it is not different from getting value from Python converting it into Rust and write it by pre-allocated address.
In pseudo-code it looks like
struct I64Serializer<'a> {
target: &'a i64;
}
impl I64Serializer<'_> {
fn serializer_i64(&mut self, v: i64) -> Result {
self.target = v;
Ok(())
}
}
Passing such serializer to a grounded object which has corresponding type will effectively copy its value into target
. The API itself is too verbose to be used as is, and thus I would re-implement as_gnd
method or add additional method to convert Number
or other primitive type using this API. For end user it should look like atom.as_gnd::<Number>()
Adding separate interface to convert primitive values is possible but I don't see whether it is different from serde.rs and thus I thought about using serialize for both serialization and conversion.
For the big stuff with unknown structure, the main interchange probably needs to be a data-blobs.
Yes, it is a possible way of doing this but as we anyway need to convert primitive values why not use this interface in both areas. It is probably possible to compose two serialization procedures: serialization of the data into a blob and serialization of the blob by the Atom serializer but I believe API is not be straightforward if one doesn't use buffering.
@luketpeterson , I implemented bool
conversion in https://github.com/trueagi-io/hyperon-experimental/pull/597/commits/0aaa30967addc15a49652921762e8350c3dee0c4
You can use this example written by Alexey to check how it works.
I added Number
serializer in https://github.com/trueagi-io/hyperon-experimental/pull/597/commits/75e5be0a83695a4b7a3ec40cff076cfef57f5e3b and here I see the first issue with using serialize
for the conversion. If one need to convert int
in Python into BigInt
in Rust then BigInt
should be a part of Serializer
interface. Which is not that bad but demonstrates that API of Serializer
and conversion could be different.
I have spent the time to understand your design and it is very elegant. I see what you mean that the conversion of primitives is no more expensive than just a simple conversion routine. My earlier worries about the complexity of serde.rs were misplaced, and I am in support of this as the basis for a conversion/ value bridging strategy.
I just have two questions, but neither is a big deal.
Do we want the serde / conversion path to own the logic for type-lattice / best-common-interchange? In other words, the logic that's now here in the types implementing Serialize? For example:
(Number::Integer(a), Number::Integer(b)) => $cast(a $op b),
(Number::Integer(a), Number::Float(b)) => $cast((a as f64) $op b),
(Number::Float(a), Number::Integer(b)) => $cast(a $op (b as f64)),
(Number::Float(a), Number::Float(b)) => $cast(a $op b),
This change makes sense since it would centralize the logic in one place and potentially cut down on duplication, bug surface, but it could be implemented in the future.
Do you think it makes sense to change the Rust names slightly, e.g. serde
-> atom_serde
, Serializer
-> ValueSerializer
. Just to avoid namespace issues with serde.rs
? I know we can use explicit symbol paths but it's just as much about making it clear to the programmer as to the compiler. (When I first looked, I had assumed you'd extended serde.rs
and others may assume that too.)
Maybe I'll take the weekend and finally implement a String grounded value. It's been heavily requested, and extending your API would help me to understand it more thoroughly.
- Do you think it makes sense to change the Rust names slightly, e.g. serde -> atom_serde, Serializer -> ValueSerializer. Just to avoid namespace issues with serde.rs?
Absolutely yes, I started from having unique names but the I thought that namespaces should have the same effect. But I agree that using namespaces is confusing.
I added
Number
serializer in 75e5be0 and here I see the first issue with usingserialize
for the conversion. If one need to convertint
in Python intoBigInt
in Rust thenBigInt
should be a part ofSerializer
interface. Which is not that bad but demonstrates that API ofSerializer
and conversion could be different.
Actually we could do it without BigInt
in the interface if we would define right protocol. So far I used very simple protocol: one value - one method call. But for Number
which can be i64
or BigInt
we could use more complex protocol which first "writes" the type (i64/f64/BigInt) and then "writes" the value itself. Thus NumberSerializer
will be able to distinguish i64
from BigInt
. On the other hand such conversion requires at least two calls on each value, and current implementation applies conversion even if native Rust type is used. Thus it may be less effective.
So far I used very simple protocol: one value - one method call. But for Number which can be i64 or BigInt we could use more complex protocol which first "writes" the type (i64/f64/BigInt) and then "writes" the value itself.
My thinking was that the type serializer might fail, and that would be a signal to move up the lattice (or down depending on your perspective). So a number could be serialized into multiple formats. And some formats would be preferred over others, but then some serializers might fail... Probably easier to illustrate with code instead of text.
As to whether a native BigInt primitive makes sense, it could go either way. I'm slightly leaning towards "yes" because scientific applications will want it, and that's our target. But we could just as easily implement it as an optional module.
Which brings me to the last point. We probably need to extend the protocol to support arbitrary types identified by unique key or short string. (As discussed in the https://github.com/trueagi-io/hyperon-experimental/files/14006084/grounded_atom_value_bridging.md document) where the type-specific routine renders the type into a buffer for corresponding routine to reconstruct on the other side.
All of these can be extensions to what you have already implemented.
JIC, I wouldn't overcomplicate it before we have use cases for arbitrary types. My request for automatic conversion was related to stdlib atm.
One thing which is still absent is unit test pack. I can add unit tests for serializer interface for each runtime but adding unit test for conversion requires ability to construct Rust grounded atoms from Python and vice versa. Thus I am not sure how to implement it without introducing special test API into stdlib.
Tried to make less confusing naming in https://github.com/trueagi-io/hyperon-experimental/pull/597/commits/8b60ebded0a3aad2b9d7024f18a08668088976d3
Hmm... is it automerge on approval? :)
Hmm... is it automerge on approval?
Hmm, should not work by this way. At least I don't remember any case of this. As we discussed it with Luke few times, I don't think it is a big deal. @luketpeterson, I can fix the comments if you have ones in the separate PR.
You were close! Classic off-by-one error. 😆
@luketpeterson, I suggest fixing the comments if you have ones in the separate PR.
I think the next test for this feature is to implement Grounded String. Which should be easy. And then BigNum / BigInt which will probably be hard and/or require changes / extensions.
If you would like, I can take that as a project for this weekend which will get me a chance to become very familiar with your design after trying to apply it myself.
@luketpeterson , I am working on MeTTa doc strings implementation so I am not going to work on this in nearest time. If you have time to implement it I would appreciate this.
One motivation for this serialization/conversion was the possibility to pass strings to stdlib functions like import!
. Before, Python strings would cause a error when passed to Rust grounded functions. So, yeah, adding grounded strings would be nice.
This draft is to discuss preliminary API. The main goal is to use this API to implement conversion of grounded values from Python to Rust and back.
Add serialize() method into Grounded API and add default implementation to not make users implement it when it is not needed. Add C and Python APIs to allow writing custom serialization code and custom Serializer implementations.