trueagi-io / hyperon-experimental

MeTTa programming language implementation
https://metta-lang.dev
MIT License
133 stars 44 forks source link

Add serialization method into Grounded API #597

Closed vsbogd closed 5 months ago

vsbogd commented 6 months ago

This draft is to discuss preliminary API. The main goal is to use this API to implement conversion of grounded values from Python to Rust and back.

Add serialize() method into Grounded API and add default implementation to not make users implement it when it is not needed. Add C and Python APIs to allow writing custom serialization code and custom Serializer implementations.

luketpeterson commented 6 months ago

Apologies I haven't yet given this PR a thorough line-by-line review. However based on just a brief glance, I have several concerns.

I guess there are two separate problems in my mind, 1. getting atoms across the "FFI barrier", ie. into and out of the MeTTa engine's internal format from an extension language (C and/or Python), and 2. Representing atoms in a stable way that could be written out to a file or sent over the network.

I am working on the belief that you are mainly trying to solve 1. and if that gets us closer to solving 2. then that's a bonus, but not the main effort. Am I right about that?

If I'm right about that point, my feeling is that the serializer interface would be better to track the structure of atoms more closely than the structure of Rust types. e.g. expr(sym(something), gnd(true)) instead of serde.rs's interface which was built to conform primarily to Rust's data types and more generally the data structures in imperative languages.

We don't necessarily need interoperability with serde.rs. The v8 project opted to extend serde.rs (https://docs.rs/serde_v8/latest/serde_v8/ ) but JavaScript objects are a closer fit for Rust's structs than MeTTa atoms. That decision could go either way.

The main thing is that the use cases I see are either going to be exchanging small primitives or exchanging grounded atoms that don't have a straightforward representation in terms of primitives. (NN weights, images, etc.)

For the small primitives (bool, some small integers & floats, etc.), it seems like a fast-path that avoids the overhead of serde might be better.

For the big stuff with unknown structure, the main interchange probably needs to be a data-blobs. For reasons of support under a future multi-threaded runner / and a further-future multi-machine deployment we might need to rethink the way these blobs are managed to avoid a bottleneck, but we can push that concern into the future.

I apologize if I have misunderstood some aspects of the design.

vsbogd commented 6 months ago

I am working on the belief that you are mainly trying to solve 1. and if that gets us closer to solving 2. then that's a bonus, but not the main effort. Am I right about that?

You are right that API in this PR is not intended to be an API to serialize any atom. Its goal to allow serialization of the grounded atoms. Yes, I am trying to solve 1. and allow reusing this API to implement 2. with respect to grounded atoms.

serializer interface would be better to track the structure of atoms more closely than the structure of Rust types. e.g. expr(sym(something), gnd(true)) instead of serde.rs's interface

I agree that for serializing Atom it would be better API, but serde.rs API suits well serializing grounded values. I tried to explain it in the comment to the https://github.com/trueagi-io/hyperon-experimental/pull/597/files#diff-99de33c2a02e714dd8b5b896c6894afcd531f47842396a32123a4b5a0d6e4cfb module.

The main thing is that the use cases I see are either going to be exchanging small primitives or exchanging grounded atoms that don't have a straightforward representation in terms of primitives. (NN weights, images, etc.)

This is correct.

For the small primitives (bool, some small integers & floats, etc.), it seems like a fast-path that avoids the overhead of serde might be better.

My current understanding is that API to convert small primitives is not very different from serde.rs API. It is difficult to explain without code but I would like to implement a "converting serializer" in Python and in Rust which gets primitive value (through serde.rs API) and just writes it to the reference passed by client. Thus it is not different from getting value from Python converting it into Rust and write it by pre-allocated address.

In pseudo-code it looks like

struct I64Serializer<'a> {
  target: &'a i64;
}

impl I64Serializer<'_> {
    fn serializer_i64(&mut self, v: i64) -> Result {
        self.target = v;
        Ok(())
    }
}

Passing such serializer to a grounded object which has corresponding type will effectively copy its value into target. The API itself is too verbose to be used as is, and thus I would re-implement as_gnd method or add additional method to convert Number or other primitive type using this API. For end user it should look like atom.as_gnd::<Number>()

Adding separate interface to convert primitive values is possible but I don't see whether it is different from serde.rs and thus I thought about using serialize for both serialization and conversion.

For the big stuff with unknown structure, the main interchange probably needs to be a data-blobs.

Yes, it is a possible way of doing this but as we anyway need to convert primitive values why not use this interface in both areas. It is probably possible to compose two serialization procedures: serialization of the data into a blob and serialization of the blob by the Atom serializer but I believe API is not be straightforward if one doesn't use buffering.

vsbogd commented 6 months ago

@luketpeterson , I implemented bool conversion in https://github.com/trueagi-io/hyperon-experimental/pull/597/commits/0aaa30967addc15a49652921762e8350c3dee0c4

vsbogd commented 6 months ago

You can use this example written by Alexey to check how it works.

vsbogd commented 6 months ago

I added Number serializer in https://github.com/trueagi-io/hyperon-experimental/pull/597/commits/75e5be0a83695a4b7a3ec40cff076cfef57f5e3b and here I see the first issue with using serialize for the conversion. If one need to convert int in Python into BigInt in Rust then BigInt should be a part of Serializer interface. Which is not that bad but demonstrates that API of Serializer and conversion could be different.

luketpeterson commented 6 months ago

I have spent the time to understand your design and it is very elegant. I see what you mean that the conversion of primitives is no more expensive than just a simple conversion routine. My earlier worries about the complexity of serde.rs were misplaced, and I am in support of this as the basis for a conversion/ value bridging strategy.

I just have two questions, but neither is a big deal.

  1. Do we want the serde / conversion path to own the logic for type-lattice / best-common-interchange? In other words, the logic that's now here in the types implementing Serialize? For example:

            (Number::Integer(a), Number::Integer(b)) => $cast(a $op b),
            (Number::Integer(a), Number::Float(b)) => $cast((a as f64) $op b),
            (Number::Float(a), Number::Integer(b)) => $cast(a $op (b as f64)),
            (Number::Float(a), Number::Float(b)) => $cast(a $op b),

    This change makes sense since it would centralize the logic in one place and potentially cut down on duplication, bug surface, but it could be implemented in the future.

  2. Do you think it makes sense to change the Rust names slightly, e.g. serde -> atom_serde, Serializer -> ValueSerializer. Just to avoid namespace issues with serde.rs? I know we can use explicit symbol paths but it's just as much about making it clear to the programmer as to the compiler. (When I first looked, I had assumed you'd extended serde.rs and others may assume that too.)

luketpeterson commented 6 months ago

Maybe I'll take the weekend and finally implement a String grounded value. It's been heavily requested, and extending your API would help me to understand it more thoroughly.

vsbogd commented 6 months ago
  1. Do you think it makes sense to change the Rust names slightly, e.g. serde -> atom_serde, Serializer -> ValueSerializer. Just to avoid namespace issues with serde.rs?

Absolutely yes, I started from having unique names but the I thought that namespaces should have the same effect. But I agree that using namespaces is confusing.

vsbogd commented 6 months ago

I added Number serializer in 75e5be0 and here I see the first issue with using serialize for the conversion. If one need to convert int in Python into BigInt in Rust then BigInt should be a part of Serializer interface. Which is not that bad but demonstrates that API of Serializer and conversion could be different.

Actually we could do it without BigInt in the interface if we would define right protocol. So far I used very simple protocol: one value - one method call. But for Number which can be i64 or BigInt we could use more complex protocol which first "writes" the type (i64/f64/BigInt) and then "writes" the value itself. Thus NumberSerializer will be able to distinguish i64 from BigInt. On the other hand such conversion requires at least two calls on each value, and current implementation applies conversion even if native Rust type is used. Thus it may be less effective.

luketpeterson commented 6 months ago

So far I used very simple protocol: one value - one method call. But for Number which can be i64 or BigInt we could use more complex protocol which first "writes" the type (i64/f64/BigInt) and then "writes" the value itself.

My thinking was that the type serializer might fail, and that would be a signal to move up the lattice (or down depending on your perspective). So a number could be serialized into multiple formats. And some formats would be preferred over others, but then some serializers might fail... Probably easier to illustrate with code instead of text.

As to whether a native BigInt primitive makes sense, it could go either way. I'm slightly leaning towards "yes" because scientific applications will want it, and that's our target. But we could just as easily implement it as an optional module.

Which brings me to the last point. We probably need to extend the protocol to support arbitrary types identified by unique key or short string. (As discussed in the https://github.com/trueagi-io/hyperon-experimental/files/14006084/grounded_atom_value_bridging.md document) where the type-specific routine renders the type into a buffer for corresponding routine to reconstruct on the other side.

All of these can be extensions to what you have already implemented.

Necr0x0Der commented 6 months ago

JIC, I wouldn't overcomplicate it before we have use cases for arbitrary types. My request for automatic conversion was related to stdlib atm.

vsbogd commented 5 months ago

One thing which is still absent is unit test pack. I can add unit tests for serializer interface for each runtime but adding unit test for conversion requires ability to construct Rust grounded atoms from Python and vice versa. Thus I am not sure how to implement it without introducing special test API into stdlib.

vsbogd commented 5 months ago

Tried to make less confusing naming in https://github.com/trueagi-io/hyperon-experimental/pull/597/commits/8b60ebded0a3aad2b9d7024f18a08668088976d3

Necr0x0Der commented 5 months ago

Hmm... is it automerge on approval? :)

vsbogd commented 5 months ago

Hmm... is it automerge on approval?

Hmm, should not work by this way. At least I don't remember any case of this. As we discussed it with Luke few times, I don't think it is a big deal. @luketpeterson, I can fix the comments if you have ones in the separate PR.

lukepeterson commented 5 months ago

You were close! Classic off-by-one error. 😆

vsbogd commented 5 months ago

@luketpeterson, I suggest fixing the comments if you have ones in the separate PR.

luketpeterson commented 5 months ago

I think the next test for this feature is to implement Grounded String. Which should be easy. And then BigNum / BigInt which will probably be hard and/or require changes / extensions.

If you would like, I can take that as a project for this weekend which will get me a chance to become very familiar with your design after trying to apply it myself.

vsbogd commented 5 months ago

@luketpeterson , I am working on MeTTa doc strings implementation so I am not going to work on this in nearest time. If you have time to implement it I would appreciate this.

Necr0x0Der commented 5 months ago

One motivation for this serialization/conversion was the possibility to pass strings to stdlib functions like import!. Before, Python strings would cause a error when passed to Rust grounded functions. So, yeah, adding grounded strings would be nice.