Re-organize single sabre package into core, reader and sabre

spy16 / sabre

Sabre is highly customisable, embeddable LISP engine for Go. :computer:

GNU General Public License v3.0

28 stars 5 forks source link

Re-organize single sabre package into core, reader and sabre #25

Closed spy16 closed 4 years ago

spy16 commented 4 years ago

Still work-in-progress. Highlights of the refactor here:

Bulk of the core LISP related functionality now lives in core package.
Reader is in a reader package.
sabre root package provides the same interface for using sabre as before. (i.e., New(), ReadEvalStr, ReadEval, ValueOf etc.)

I was experimenting with the idea discussed in #16 .. But most likely i will go back to the old Value interface itself.

Feel free to provide feedback and design inputs if any.

Upcoming changes:

Move Set, HashMap and Module to sabre package instead of core ? (just a thought.. since these are not really very well implemented at the moment and are not really necessary for pure LISP like usage)
Re-define special-forms system. Expose Analyse as public function to allow adding custom special forms? (I don't want to expose since either the special form is very generic in which case users can raise a PR and we can merge it into Sabre. If not, it probably can be implemented as a macro/function)..

lthibault commented 4 years ago

On a high level, this all looks 👍

I have a few general design notes for the future, which I'll share below. Any specific problems or blocking issues will be mentioned directly in source comments.

Design notes

1. Value interface

I know we've discussed this before, but part of me is tempted to go all-in on the Value interface, with the main goal of reducing type assertions.

The general idea is to define a rich Value interface that contains both required and optional methods. Values that do not implement a given optional method would simply return ErrNotImplemented, allowing the runtime to take appropriate action.

To illustrate, it might look something like this:

var ErrNotImplemented = errors.New("not implemented")

type Value interface {
    Source() string
    Eval(env Env) (Value, error)
    Invoke(env Env, args ...Value) (Value, error)
    Compare(other Value) (bool, error)  # note the added error value
        // ...
}

Obviously this adds a bit of boilerplate when implementing new values, but the advantages are:

Flexibility
Performance (because no reflection)
Readability (because "reflection is never clear" and Value functionality is no longer split between multiple interfaces)

Regarding flexibility, a motivating example might be useful. One thing I would like to do is to automatically cast/convert types during operations on heterogeneous types, or during numerical overflows. For example:

(+ 0.00125 22)
;; prints `22.00125`, a float

I don't believe there's a simple way of doing this right now. An approach I like a lot is to define an Add(other Value) (Value, error) method to Value. The + function would then be implemented as follows:

func Add(v0, v1 Value) (Value, error) {
    return v0.Add(v1)
}

And Float64.Add might be implemented as follows:

func (f Float64) Add(other Value) (Value, error) {
    switch v.(type) {
    case Int64, Float64:
        Float64(f + v), nil
    default:
        // N.B.:  this allows the builtin `Float64` to gracefully handle
        // user-defined types.  Addition is commutative, so we can defer
        // to the other value's implementation of Add.
        other.Add(f)
    }
}

If I someday want to add a BigInt type, I now only need to ensure its Add method (and eventual Multiply, Divide, ShiftLeft and ShiftRight) methods correctly handle builtins.

With regards to performance, such an object model has the potential to greatly improve member access expressions (see: https://github.com/spy16/sabre/issues/12). Instead of relying on reflection, Value can define the following methods:

type Value interface {
        // ...

        // Bind a value to an attribute name
        Bind(name string, v Value) error

        // Resolve an attribute by name
        Resolve(name string, defaultValue Value) (Value, error)

        // ...
}

Now, implementations are free to provide member access with e.g. a map[string]Value, although the Any type can still use reflection under the hood.

BTW: I also feel quite strongly that things like IsHashable are an anti-pattern. How can Sabre users create their own hashable types and use them in maps? I'm planning on building immutable datastructures, so there's no reason a list or map can't be used as a key.

The proposed approach would allow us to specify a Hash() (Value, error) method, such that an implementation can return a hash value that is different from the concrete value.

One last thing: if the Value interface gets too big, we can always fall back on Value.Resolve. For example, a map implementation can could call someKey.Resolve("Hash", nil) to check if the key's type implements a custom hashing method. If it does not, it can attempt to hash the concrete value. This would allow us to have many optional methods in our object model, without cluttering the Value interface, and without the drawbacks of reflection.

We don't have to do any of this now, but I'm interested in keeping this discussion going. I'm more and more convinced some form of object model is a must-have.

2. Core utilities

I would really like to see functions like containerString in some sort of package. Sabre's mission is to provide a batteries-included toolchain for building a lisp dialect in Go. Language internals are part of that contract, so I believe this is in-scope.

Same goes for evalValueList, newEvalErr, getPosition, etc. My project requires a lot of custom types, so these are all very useful.

3. `Value.Source()` should be replaced with `fmt.Stringer`

Like you, I'm increasingly convinced that Value should provide its string representation. That said, I think the String() string method is preferable to Source() string for two reasons:

First, fmt.Stringer means we can directly print values using the fmt or log packages:

// good
log.Printf("error encountered in %s at %s%s\n%s", filePath, lineNo, colNo, myValue)

// bad
log.Printf("error encountered in %s at %s%s\n%s", filePath, lineNo, colNo, myValue.Source())

Second, Source() is arguably incorrect. Lisp is often referred to as a "syntax-free" language, since it's primary syntax is the set of data-structures itself, not their homoiconic string representations. This is arguably a detail, but it's nonetheless "more correct" to say "I'm returning a string representation of lisp source" than "I'm returning the source-code of this memory object". In other words, strings of symbols are serializations of lisp source.

4. `Set` should be implemented with a map.

I suspect there might be an issue with this suggestion, but I still thought I'd bring it up. I think the Set type should be backed by a hash map. Any reason why you used []Value instead of map[Value]struct{}?

5. Container types should be interfaces.

This is how Clojure does it. Using interfaces makes it much easier to implement custom types, which is in keeping with Sabre's positioning as a "build your own lisp toolkit".

I would like to see an abstract interface for List, Vector, Map, Set and Module, along with an unexported, default implementation for each. Notwithstanding the above object-model suggestions, the interface for the current Map implementation would be:

type Map interface {
    Value
    Get(Value) Value
    Set(Value, Value)
    Keys() Values
    Values() Values
}

We could then have default constructors for each interface (perhaps in a builtin package, or something). For Map, it would look something like this:

package builtin

func NewMap() core.Map {
    return hashMap{Data: make(map[core.Value]core.Value)}
}

Note also: I don't think it's appropriate for the exported type to be called HashMap. The Hash prefix is an implementation detail. Users of Sabre may wish to create maps based on trees, or other non-hash-based approaches. Users should care about behavior, not internals. Can we call it Map, instead?

spy16 commented 4 years ago

This is great, Interesting points. Honestly, this PR is in draft mode exactly because i wanted to get the discussion going before finalising all the changes. I might not even go ahead with this PR.

1. Value Interface

go all-in on the Value interface, with the main goal of reducing type assertions.

This is definitely an interesting idea and perhaps we can combine some of the value type variations. But I am not entirely convinced that going all-in is a good idea.

Flexibility:
- The example (+ 0.00125 22) is already supported. As long as the function parameters have types to which the arguments are assignable (tested using AssignableTo() and ConvertibleTo() reflection functions), function invocation takes care of this automatically. For example this works when you pass int value to a float parameter, or you have a function that takes Go native bool value but you pass core.Bool value etc.
- It's not possible to have all the possible operations pre-defined on value interface and I feel it wouldn't make sense to force all values to implement all the methods. I count 7 methods from the examples and interaface samples you provided alone. This does not include the Multiply, Divide, ShiftLeft, ShiftRight etc.
- I feel these kind of "generalised" functions (like Add, Multiply etc.) can be added into a standard-library kind of package with well-defined interfaces (Adder etc). For example, we could have Add function as:
```
func Add(v0, v1 Value) (Value, error) {
// 2 `if` checks assuming commutative property. 
// but languages that provide ways to overload operators for
// custom types, do not assume `commutative` property because it
// can be domain dependent and order dependent. (for example [1,2]+[3,4] is not commutative in python).
// Also, assume type T1 defines __add__(self, other) method
// and type T2 doesn't. Then `t1 + t2` is valid but `t2 + t1` is not.
if adder, ok := v0.(Adder); ok {
 return adder.Add(v1)
} else if adder, ok := v1.(Adder); ok {
 return adder.Add(v0)
}

if primitiveTypes(v0, v1) {
 return addPrimitive(v0, v1)
}

return nil, errors.New("don't know how to add")
}

type Adder interface {
Add(other Adder)
}
```
  This might look like there are many type assertions here, but if you think of the suggested approach, each Adder implementation will have type switch like the one you have shown for Float64 .
- I have introduced a memberAccessor optional interface in this code. When a qualified symbol resolved, for example foo.bar.baz, foo is resolved from current scope, then the request to resolve bar.baz is dispathed to the value found for foo if it implements memberAccessor which effectively provides the same flexibility..
- HashMap , Set etc. are very naive implementations and completely agree that IsHashable is a bad idea. It only exists to prevent panics if you try to write {[1 2 3] "hello"}. I have no objections on having Hash() method as part of Value interface itself.
- I do not see how someKey.Resolve("Hash", nil) would be better than type assertion (You used reflection to refer to the current approach, but technically current approach is optional interfaces using type assertion).
Performance: Mostly agree on this. Although since everywhere we still have to access values through an indirection (the Value interface), nothing gets inlined. So I suspect the difference will not be much.
Readability:
- Type assertion is not as bad as reflection (which is what the idiom is about) itself from readbility standpoint. But I agree that no type assertion would still be better in many cases. But on the other hand, using optional type makes it obvious what each Value type is capable of. For example, it is immediately obvious to the reader that Keyword is invokable just by looking at the methods it has. Imagine you are looking at godoc of sabre. If you go down the ErrNotImplemented approach, user needs to actually go into source to figure out if it actually implements it.
- Also, I do believe that, when size of Value interface grows, there will be multiple redundant methods (the ones that return ErrNotImplemented ) on different value types which will offset any positive impact of removing type assertion.

2. Core Utilities

containerString can be exposed i guess.
getPosition i haven't exposed yet, because i was considering removing Position type in favor of more generic metadata property.
evalValueList and newEvalErr are already public from core package.

3. Value.Source() should be replaced

No arguments here.
Like i said in the PR description, i was simply experimenting with the ideas from #16. Since Eval is in a separate Expr interface in this PR, if i used String() method, any Stringer implementation would become a Value . To prevent that, i used Source() method. It is simply acting as a marker interface here. But this will definitely go away.

4. Set should be implemented with a map

Again, no arguments here. I even have a TODO marker to remove the current naive implementation in code.

5. Container types should be interfaces

Sorry, but I don't see how it can make it easier to implement custom container types. Could you show me a use case may be?
Clojure needs to do it this way since Java uses nominal type system which makes it harder to add interfaces at some point in future if needed. So it is common in Java to define an interface upfront, and provide one concrete implementation.
But Go doesn't really need this since it uses structural type system. Users of the concrete type can swap out their usage with an interface without really requiring any changes to the existing concrete type.
For example, assume that a Contains(m *HashMap, key Value) bool existed today and a user adds a custom map implementation, so this needs to be modified. It would be as simple as changing the signature to Contains(m Map, key Value) bool where Map is an interface like the one you described. None of the existing code will break since *HashMap will continue to be compatible. At the same time, Contains becomes flexible enough to support custom values. But i do agree that the functions and logic built into Sabre will need to use interface from start to be able to have this flexibility and be able to support custom container types

Ultimately, I see the following ideas as something we should definitely do:

Value interface should have some more methods, but not all of them. Probably something like:

type Value interface {
// definitely
Eval(env Env) (Value, error)
Hash() uint64 // just an example.
String() string

// may be. But i still feel since the usage of `Invoke` is limited,
// it is cleaner keep it optional.
Invoke(env Env, args ...Value) (Value, error)
}

Built-in logic around container types like Set, Map etc. should use interfaces than concrete types.

Also, a parallel idea I have been thinking about is a concept of runtime which would allow more flexibility and also enable supporting Persistent data structures. But would require significantly more work and will be complex.

type Runtime interface {
  Env

  // Map, List, Vector etc. need to be interfaces here.
  NewMap(pairs ...KVPair) (Map, error)
  NewList(items ...Value) (List, error)
  NewVector(items ...Value) (Vector, error)
  NewSet(items ...Value) (Set, error)
}

lthibault commented 4 years ago

@spy16 Thanks for the thoughtful reply! I'm glad we're on the same page, and I'm truly grateful that my input is appreciated!

Your points are well-taken, and there were indeed a few misunderstandings on my part, so this has been very helpful and interesting.

I think we're quickly converging on concrete solutions, so hopefully this will be the last of my super-long comments!

1. Value Interface

[...] perhaps we can combine some of the value type variations. But I am not entirely convinced that going all-in is a good idea.

100% agreed. I made the case for "all-in" in my previous post, but it is indeed clear that we should take a more nuanced position. In that spirit, here's the part of my proposal that I still find relevant: a documented, concrete (in code) model of the Sabre runtime and its interaction with values. Note how there are two distinct parts to this:

Documented. Where can I get a "big-picture" view of how the Sabre runtime is organized? This can be as simple as docstrings.
Concrete. The public codebase should consist of concrete artifacts -- structs, interfaces, functions, etc. -- that reflect the important parts of the model.

Here's my reality with Sabre right now:

Documentation: I know what each piece does, but not how all the pieces fit together. That requires careful reading of the source. It's not obvious from the parts. I feel like I'm looking at clay rather than Legos.
Concreteness: Runtime contracts are not explicitly specified (except in rare instances, as with Env). When designing custom values, I'm paralyzed by choice. How do I know if I'm "doing it wrong"? What does "doing it right" look like? Is method (*List).Foo part of the list contract with the runtime? Or is it part of some optional interface? Or is it an implementation detail that my own type doesn't need? How can I plan ahead and build a non-trivial architecture if my assumptions may turn out to be invalidated after a few days of coding?

Am I making sense? I think I mistakenly pitched my initial comment as a technical discussion, when it actually relates to cognition, pedagogy, and community.

The solution, I think, lies in providing some sort of scaffolding to restrict developer freedom just enough to guide their thinking. Assuming we agree on this, then Value is a key part of this runtime model. Getting Value right becomes important, and here are the constraints I see:

If Value is not restrictive enough (interface is too small), it becomes difficult to understand the mechanics of how it is processed by Sabre. This in turn makes it hard to build abstractions with any mechanical sympathy. Slow, buggy software results.
If Value is too restrictive (interface too large), then using Sabre becomes tedious, or users begin creatively circumventing things, resulting in buggy, unmaintainable software.
If Value is too distributed (too many optional interfaces), discoverability becomes difficult. It becomes hard to reason about the full range of behaviors that Value can exhibit. Somewhat less importantly, it may also hinder performance.

To respond more specifically to your comment:

Regarding memberAccessor
- I'm still concerned about the performance implications. I think member lookup will be heavily used, given that encapsulation is such a big part of Go.
- The upside of Value-native member access is high, because an efficient implementation paves the way for heaps of useful abstraction. For example, it makes it feasible for users to implement dynamic-methods and duck-typing in their languages.
- Overall, I think there's a strong case for adding AccessMember (along with its complement AssignMember) to Value.
Regarding a Hash() method: we should think about the implications of adding Hash() directly to Value, versus relying on a dynamic methods via AccessMember. I'm curious to hear your thoughts.
- Arguments in favor of a native method:
- Performance
- Explicitness
- Arguments in favor of a dynamic method:
- Reduction in boilerplate
- Simplification of Value interface.

Overall, I would like to suggest the following compromise. What do you think?

type Value interface {
    // String returns an s-expression representing the value
    String() string

    // @spy16 - I can't think of a case where a Value isn't an
    //                  expression.  Am I overlooking something?
    Eval(env Env) (Value, error)

    // Hash returns a hashable representation of the value.
    // Most Values will simply return themselves, but certain
    // types may require custom hashing, or may not support
    // hashing altogether.  In the latter case, they should return
    // a non-nil error.
    Hash() (Value, error)

    /*
        Member access methods.
        @spy16:  I prefer this nomenclature & these function signatures.
                        Thoughts?
    */

    // GetAttr accesses a member attribute.  If the attribute does
    // not exist AND `fallback` is not nil, `fallback` is returned.
    GetAttr(name string, fallback Value) (Value, error)

    // SetAttr assigns a member attribute.
    SetAttr(name string, Value) error
}

N.B.: I agree with keeping Invoke optional, at least for now.

2. Core Utilities

containerString can be exposed i guess.

👍

getPosition i haven't exposed yet, because i was considering removing Position type in favor of more generic metadata property.

👍 That's fair. When a mature solution is released, I think it should be exported.

evalValueList and newEvalErr are already public from core package.

👍

3. `Value.Source()` should be replaced

👍

4. Set should be implemented with a map

👍

5. Container types should be interfaces

Sorry, but I don't see how it can make it easier to implement custom container types. Could you show me a use case may be?

I was admittedly unclear in my previous comment, but I think we're actually saying (almost) the same thing.

My proposal is basically:

Let's create a Map interface
Let's not export the concrete implementation of Map (i.e.: HashMap).

As before, my reasoning is cognitive, rather than technical. When implementing a custom map, I currently find myself asking questions like "is func (*HashMap) Get(...) part of the map contract?". I need to scour the source code to find the answer.

Also, a parallel idea I have been thinking about is a concept of runtime which would allow more flexibility and also enable supporting Persistent data structures. But would require significantly more work and will be complex.

You've got my attention! 😃 How do you see this working?

Apart from enabling persistent datastructures, I like how this groups most (all?) of the Sabre-isms in one place. It provides much of what I'm asking for, namely:

Discoverability: this tells me exactly what I must build in order to have my own Sabre-powered lisp. It provides a clear entry-point into the world of Sabre.
Structure: it groups the major Sabre operations (symbol lookup & value creation) under a single thing. Code architecture follows from this, which provides "just enough" restriction of freedom to guide design.
Dependency Injection: it cleanly separates (a) knowledge of Sabre internals (b) knowledge of user-defined language semantics (c) knowledge of data-structure implementations. This is especially helpful given that Sabre is an interface between the functional and the imperative world. It provides clear boundaries for where stateful things should reside.

Honestly, I really like this. Runtime provides a global runtime model, and Value provides a local "object" model. The runtime transforms objects. Everything is clear.

spy16 commented 4 years ago

I've been super busy. I have few things I would like to discuss here. I will get back to you this weekend. :)

lthibault commented 4 years ago

@spy16 No worries, take your time! Happy Friday!

spy16 commented 4 years ago

I see our comments converging which is definitely a good sign. I think we are on the same page for points 2-5. With respect to containers as interface types, I don't have have any objections and I completely agree with your points on contracts not being clear.

Like you said, I think Documented and Concrete are 2 very important things we can use to guide the discussion here and are preceisely the reasons why I am not in favor of GetAttr/SetAttr. Before explaining the reasons, here are some assumptions which I think will be true for most usecases.

From a LISP session, reflection is not entirely avoidable regardless of the approach we choose (Since eval still needs to make sure the args care compatible with the parameters of the value being invoked)
With 1 being true, most usecases of Value-native methods are in Go code (not from a LISP session).

Why not Value-native GetAttr/SetAttr for most things?

Documented
- With the proposed Value type, all accesses to things like Hash for example are done as GetAttr("Hash", nil) which makes it a string based contract and it's not possible to use any Go documentation ways to document this expectation worsening the problem of good documentation.
- You can't get a big picture at all since even if some methods are logically part of same contract (For example methods of Seq interface), they would be distributed all over the place because there is NO language backed concrete contract definition.
Concreteness
- v.GetAttr("Hash", nil)
  - This can return an error or a value. So an if check to ensure it didn't error out is necessary.
  - It is not guaranteed that the returned value here is Invokable. So an extra type assertion/conversion is needed to assert that it can be invoked.
  - Only after the above 2 checks will you be able to do Hash().
```
attr, err := v.GetAttr("Hash", nil)
if err != nil {
 return err
}
// attr is guaranteed to be a value, but is it callable?
if invokable, canInvoke := attr.(Invokable); canInvoke {
 hash, err := invokable(/* what if it expects some arguments */)
 // check this error also
}
```
  - We could also make the Invoke() part of Value type which will eliminate the type assertion above, but error check and ambuguity about parameter still remains. It increases ambiguity in other places as well: For example, all primitive types will have to implement Invoke() now. It will not be clear to the user by looking at say GoDoc whether Symbol is really invokable or it's just there to satisfy Value. (Careful examination of the entire codebase is required to understand what is really invokable and what is not). Not only that, what about Next() Seq , we can't move all of them to Value?
  - If you consider the optional interfaces approach, a single type assertion like the one below guarantees everything works as expected and is very explicit to both the user and the compiler (which has more benefits, explained below)
```
if h, hashable := v.(Hashable); ok {
 h.Hash()
} 
```
  - Also notice how, type assertion cannot be avoided in proposed approach as well and the total overhead for proposed approach would be at least: a function call + an if for err check + a type assertion where as optional interface approach has 1 type assertion
- Optional interface provides better concreteness in multiple ways:
  1. You can document the interface properly which shows up in GoDoc which adds to the Documentation aspect as well.
  2. While it looks like it we can avoid type assertions by doing GetAttr, we really cannot as explained in the above examples. Optional interface can provide a compiler verified contract definition and ensure it stays working.
  3. We can also leave var _ Seq = (*List)(nil)assertions at the top of files which serves 2 purposes: compiler can ensure the types are guarnteed to implement the contracts and also it clearly explains what the value type is capable of (In this example, it says list is seqable).
- Performance:
  - I did a trivial benchmark and the difference between Value native GetAttr and access through type assertion turned out to be 7 ns. This value can be significant when lot of such accesses are done, but as Rob Pike says n is usually small. :)
  - Even if n is large, the benchmark didn't account for any implementation detail of GetAttr itself. I can see 2 non-reflection approaches here:
    1. a switch case: probably not that costly upto some point, but can become costly. Besides, it can't provide the dynamic nature you are expecting since it is compile time construct. Also, this approach can be bug-prone (string case etc.)
    2. map[string]Value: not costly considering since ammortized constant time, but cost added by the checks after the GetAttr returns as explained above still remains.

What i am proposing:

For example when foo.bar.baz is accessed:
1. Resolve foo from current env using Resolve
2. Use type assertion to see if returned value provides explicit GeAttr. If it does, do GetAttr("bar", nil) and use the returned value. If it does not, fallback to reflection.
3. Once bar is resolved, repeat step 2 for baz
Advantages:
- True dynamic access is available for all values. For example, if you add a method to List, it automatically becomes available to the LISP environment..
- If performance is critical or the attr lookup should be done from a map, value type can implement the optional interface.
- In all other cases (normal usages without any such custom requirements, using the built-in value types etc.), overhead of adding GetAttr/SetAttr methods even when it doesn't logically make sense is avoided. (For example, what would setting attribute on Int/Bool mean?)

With respect to Hash() (Value, error) :

There is some ambiguity here. If the Hash returns again a Value, how would things depending on a hashable value work? Since the returned value is again a Value type, assuming it can be used as a key to a map might be a bad idea. (For example, if the underlying map implementation is using Go native hash-map, then it will cause panic if the returned value is not "comparable" in the Go world).
This is why the suggestion of Hash() uint64 (or something similar like Hash() string). And the reason for not allowing it to return error is simplicity. When you have return signature as (Value, error) multiple combinations are possible and handling all the cases in every function which calls Hash() might be error-prone. And when a value is non-hashable may be we can use a convention of returning zero value ? This is just a suggestion and I am open to have the error if you think this is not a good idea. Also this idea is influenced partly by existing models followed in Java (hashCode() int), C# (GetHashCode() int), Python (__hash__() -> int) etc.

Phew! that's a super long comment. Let me know your thoughts. :)

I am experimenting with a slightly different organization as well (better contract in a core package, actual implementations separated into collection package etc., also interfaces with immutable/persisten collections in mind) and will push it or describe here soon.

I have not really thought through the Runtime approach. If you have any concrete ideas here, do let me know.

lthibault commented 4 years ago

Hello! Let me dive right in, as it feels like we're making serious headway!

Member access semantics

Ok, you've convinced me that Get/SetAttr are neither concrete nor well-documented. In particular, I hadn't considered the need type-assertions after the call to GetAttr.

Another argument to your point: item 5 (containers as interfaces) provides us with much of the concreteness and self-documentation we need.

Regarding performance:

You make a good point that some reflection is inevitable at the Lisp/Go boundary.
I agree that it seems unlikely that we can significantly reduce type assertions through clever interface design (which was my initial hope/belief)
7ns is indeed much smaller than I expected.

As it sounds like we both agree that performance is important (albeit secondary to other considerations), perhaps we should write a benchmarking suite fairly soon? This way, we can at least guard against significant performance regressions (e.g. an order of magnitude, or more).

Regarding your proposal:

I'm onboard. My observations:

The optional interface amortizes well in cases of repeated attribute lookups on the same object.
The type-assertion penalty will of course amortize poorly in cases of chained lookups (e.g. foo.bar.baz).
Nevertheless
- Chained lookups will be O(n), and on the order of nanoseconds. Acceptable.
- n is likely to remain fairly small (i.e. short chains).

Anyway, 👍

`Hash()`

There is some ambiguity here. If the Hash returns again a Value, how would things depending on a hashable value work?

The idea is that Hash doesn't compute a hash, but instead provides a hashable representation of the value. It is this representation which is then fed to a hash function.

But before going any further, I agree that returning an error is counterproductive. Let's change the signature to Hash() Value.

Now, every type must provide a hashable representation of itself. There are only two possibilities:

Natively-hashable types (e.g. Int64 or String)
- v.Hash() returns v
Non-hashable types (e.g. Map)
- v.Hash() returns some guaranteed-hashable value that is, in effect, an encoding of v.

I like this because it means Hash no longer depends on any particular hashing algorithm. It's up to the caller to decide which hashing algorithm is most appropriate depending on the situation (crypto? use sha512. hashmap? use murmur. etc...).

Misc.

I am experimenting with a slightly different organization as well (better contract in a core package, actual implementations separated into collection package etc., also interfaces with immutable/persisten collections in mind) and will push it or describe here soon.

Looking forward to it 😃

I have not really thought through the Runtime approach. If you have any concrete ideas here, do let me know.

If I'm not mistaken, the changes involved should be minimal. The only difference between Runtime and Env is that Runtime also provides factory methods for various types.

What if we instead kept Runtime and Env separate? We would then:

Also pass Runtime as an argument to any function that currently takes Env
Pass Runtime to Reader.

In this way, Reader now knows how to create new values. Same with e.g. Eval. User-defined constructors can be accessed via optional interfaces and type assertion.

Am I overlooking something?

Phew! We're getting there! 😄

spy16 commented 4 years ago

should write a benchmarking suite fairly soon

Agreed.

It's up to the caller to decide which hashing algorithm is most appropriate

Hmm I like this idea. Yea, we can do that. I see the process of hashing working as below:

Something (may be a specific hashmap implementation), calls v.Hash() and gets back hv..
Now it passes hv to a specific hashing function hashActual(hv) and gets back the hash value and uses it to build the hashmap.

One question on hashActual. what do you think should be the behaviour when hv is non-hashable? For example, let's say a map value 'm' ends up returning itself when m.Hash() is called (this would work since Hash() is expected to return a Value and m is one.).. But what does the hashActual do now ?

Looking forward to it 😃

Just to give some overview:

core package defines just the core interfaces in a single file (core.go), somewhat ordered by their importance, to make it easy to discover and to get a big picture of how things interact. Includes Value, Env, Seq, Vector, Map, Set, Invokable, Comparable, Attributable.
core defines primitive value types (nil, Bool, Int etc.), a core List type, Reader type which by default reads only whatever is in the core package.
core also defines some of the commonly used functions, including EvalAll, SeqString, Compare, VerifyArgCount etc. and an Error type (which is used for both eval and read errors).. So in other words, core is self-sufficient to build a pure lisp setup (When I say pure, i mean no modern things like vector, set, map etc.)
Also, core does not use complex reflection logic (only uses reflect.TypeOf(v).String() to create error strings, and reflect.DeepEqual as a fallback in Compare)
collection package which defines Vector, HashMap , HashSet in accordance with the core.Map and core.Set interfaces. Also defines the respective reader macros VectorReader, HashMapReader etc. (Let me know if you have better package name)
sabre package which brings collection and core together to provide a more feature rich setup. For example, sabre.NewReader returns a reader capable of reading collection types along with core types. (Basic use-cases will just import github.com/spy16/sabre and do sabre.New() , sabre.ReadEval etc.)
repl package remains the same. (But only depends on core package)
Also, in order to be able to support immutable structures (i have no experience building them, but i guess i know the following would be a minimal requirement), i have defined mutating operations in interfaces similar to this: Map.Assoc(key, val Value) (Map, error) (notice it returns a Map).. For now, the returned value will be itself. But if we want to support immutable data structures, this can help (if you have experience with this, do let me know)..

I am currently working on cleaning up the Fn MultiFn and SpecialForm types and the macro system. Do you think it would be better to have these in a package called invoke or something or have it in sabre root package? (There are cyclic dependency issues i need to think about to have separate package, but let me know your preference here)..

Runtime and env separate

Yep, you are right. Also with the above design, I think the Runtime can remain external also (for example, from the reader side, VectorReader is specific to an implementation of Vector and it is free to figure out how to create a vector instance.. but need to see how to handle initialization of vectors from other functions.. )

lthibault commented 4 years ago

what do you think should be the behaviour when hv is non-hashable?

I think hashActual should panic.

The reason is that removing the error out of Hash() and hashActual assumes:

[...] every type must provide a hashable representation of itself [...]

If a developer violates this constraint, then all bets are off. Let it crash.

This is very much a design choice. We are requiring all types to be hashable, but I would cautiously argue that this is a good thing because:

Clojure does it.
It's simple. No value is special.
It's powerful. You can do useful things that are impossible in pure Go.
The inability to hash certain values is (IMHO) a failing of most languages. It's a wart, not a feature.

Re: package organization

core package defines just the core interfaces in a single file (core.go)

👍 Yup. This gives me the "one obvious point of entry" feature.

Also, core does not use complex reflection logic [...]

This feels right. Reflection is fine, but it's scary for users. I agree it should be tucked away as deeply as possible, and be as generic as possible. Ideally, users would never have to look at it.

collection package which defines Vector, HashMap , HashSet [...]

I would prefer to see factory functions like NewVector, NewMap and NewSet, which return the corresponding interface from core. The concrete implementations shouldn't even be exported, IMHO.

(Let me know if you have better package name)

What if we renamed core to runtime and called this package core instead?

I find this more intuitive. The runtime package contains high-level runtime interfaces. core contains core implementations of runtime interfaces. Bonus: runtime.Runtime should live in ... well ... runtime.

To avoid any confusion, the rest of comment does not assume this suggested package structure.

sabre package which brings collection and core together to provide a more feature rich setup.

Agreed. This doubles as a canonical example of how to wire everything together. +1 for such self-documentation.

I have defined mutating operations in interfaces similar to this: Map.Assoc(key, val Value) (Map, error) (notice it returns a Map).

Perfect! I have a bit of experience, and as far as I'm aware, the above return-value is the only requirement.

[...] Fn MultiFn and SpecialForm types and the macro system [...] Do you think it would be better to have these in a package called invoke or something or have it in sabre root package?

Intuitively, I'd look for these in the core package. In my mind they are very much core features.

spy16 commented 4 years ago

What if we renamed core to runtime and called this package core instead

Yea, that makes sense. (runtime.Reader is bit odd now though. not sure if we should move this to sabre altogether. but on the other hand, may be it should be okay considering most usage of reader might be from sabre.NewReader)..

runtime.Runtime should live in ... well ... runtime.

With the new interfaces designed suitably for persistent structures also, i am thinking may be we don't need the fat Runtime interface.. But i like the name 🙂, so we could rename Env into Runtime (it is kind of runtime since Eval also goes through it) . also runtime.New() Runtime reads well.

Intuitively, I'd look for these in the core package. In my mind they are very much core features.

Yes, agreed. ✔️

I would prefer to see factory functions like NewVector, NewMap and NewSet, which return the corresponding interface from core. The concrete implementations shouldn't even be exported, IMHO.

I usually prefer making zero values of types usable (usage would be var hm core.HashMap), just because it feels more natural for data structures (similar to var a int, var b []byte etc).. But in this case I am not entirely sure yet since if we want to keep support for persistent structures, directly doing var hm core.HashMap wouldn't work. So, agreed ✔️

Few doubts:

Do you have some ideas in mind about the Map, Set & Vector interfaces ? Also, do you think we need an interface for List also? How would it be different from Vector? If you do have some ideas, do send the interface definitions.

Another one on collection factories. There are 2 ways a collection type might be constructed.

From the Reader.
From any other Go function call (might be called from the LISP session but ultimately comes down to NewMap somewhere)

Approach 1: Have signature of factories as NewMap(env) and the factory checks if the given env supports env.NewMap or something of the sort and dispatch there (the Runtime interface). If not , fallback to initialising the simple hash-map implementations.. Problem with this approach is that the reader will have to know about the runtime.

Approach 2: we could change the Env interface to Runtime and make it mandatory for runtimes to implement NewMap, NewVector and NewSet

Approach 3: For now we just have core.NewMap(), core.NewSet()etc. which always init mutable implementations. When we want to support persistent structures, we add persistent package and define package global primordial values (which can be private) for map, set etc. and persistent.NewMap etc, use those to build from (this is what Clojure does i believe).. Once built we will always do map.assoc which returns a new map and so on.

I like approach 3 the most since it doesn't complicate reader, doesn't introduce more interfaces or more type assertions to check what factory methods the env supports and also doesn't make it mandatory for implementing all factories.

lthibault commented 4 years ago

Yea, that makes sense. (runtime.Reader is bit odd now though. not sure if we should move this to sabre altogether.

I'm in favor of putting Reader in the sabre package.

Intuitively, I would expect Reader not to live in the same package as Runtime. The former deals with building an AST, while the latter deals with how an AST is evaluated. The former deals with syntax, the latter with semantics.

This dissociation isn't 100% perfect, but I expect a Reader to turn character strings into a runtime. I don't expect a runtime have any knowledge about a reader/de-serializer.

But i like the name 🙂, so we could rename Env into Runtime (it is kind of runtime since Eval also goes through it) . also runtime.New() Runtime reads well.

You're not alone - "runtime" has a nice ring to it! ✔️

Do you have some ideas in mind about the Map, Set & Vector interfaces ?

In general? Or are you concerned about a specific issue?

Reading between the lines, it seems like you're noticing that Vector and List are very similar. The major difference between a list and a vector is in their intended uses:

Vector provides fast iteration and fast indexed lookup
List provides fast pop/append operations at the head

In both cases, the core implementation will probably use a slice, but a persistent implementation will likely prefer to implement List with a persistent linked-list. For this reason, I think we should:

define List
differentiate the List and Vector interfaces to reflect their intended usage

As such, I would cautiously suggest restricting what can be done with a list. In particular, disallowing indexed lookups.

What do you think of this, as a starting-point for discussion?

type List interface{
    Value
    Seq

    Head() Value
    Tail() Seq
    Pop() (Value, Seq)
    Append(...Value) Seq
}

type Vector interface {
    List

    Get(int) Value
    Slice(start, stop, step) Vector
    Insert(int, Value) Vector
}

Another one on collection factories.

I also like Approach 3. +1 for simple factory functions. ✔️

I think hashActual should panic.

You didn't mention anything about this -- does it seem reasonable to you?

spy16 commented 4 years ago

Oh yea, I agree about the panic. I had the same thing in mind. As long as the contract is explicit, implementations violating the constraint is definitely a cause for panic.

spy16 commented 4 years ago

Reading between the lines, it seems like you're noticing that Vector and List are very similar

Well, I knew the difference :smirk: .. Until haflway through this discussion, I was not considering persistent structures in the design, so List and Vector themselves were the implementation details. In the mutable world, List usually means a Linked-List (Fast pop/append at the head) and Vector means a dynamic array (mutable, generally simply a wrapper around Go slice, fast iteration and fast index lookup).. So I was not really sure if we needed an interface there. But considering now we have 2 possible variations for them (mutable + immutable), I just mentioned it now.

List

My confusion was simply because the Seq interface is exactly what defines the List.

Head is simply First()
Tail is simply Next()
Append is Cons (or Conj for multiple)
Pop : In Clojure, pop is same as Next for Lists (Warn: Only w.r.t the return value). A general pop method (remove first item and return it) doesn't make lot of sense for persistent structures. I see that you have handled this case by returning both the popped value and the new seq, but same effect can be obtained using First() (Idea is to peek, same as in Clojure) and then use Next() (similar to Clojure pop).. So not sure if we should extend the contract to include this.,

So may be we are better off using Seq as the expected type wherever we are expecting a List (like Clojure does with ISeq)? And Seq is defined as:

type Seq interface {
  Value

  Cons(v Value) Seq
  Conj(vals ...Value) Seq
  First() Value
  Next() Seq
  Count() int
}

// See Map, Set & Vector for usage.
type Seqable interface {
  Value
  Seq() Seq // can even be a lazy sequence for persistent 
}

Vector

Vector is probably straightforward since we can consider the index lookup requirement. (Clojure uses IAssociative for this. And for maps also, but they extend this with Keys() and Values() if I remember correctly). The idea is, Vector is keyed collection where the Key is the index. For maps, ....obvious.

May be we can borrow things from IAssociative as well?

type Vector interface {
  Value

  Seq() Seq
  Get(key int) (Value, error)         // or EntryAt(), error to handle index out of range
  Conj(items ...Value) Vector           // or Append()
  Assoc(key int, val Value) (Vector, error) // error to handle index-out-of-range
}

Why not reuse Seq? : If we re-use Seq, when user does vec.Conj(someValue) to add a value to a vector, the returned value is Seq which is hard to deal with if the use-case requires return value also as vector (which is the case most of the times.. this is easy in Java since it supports Covariance)

Why not embed Seq and also define vector specific Cons/Conj?: Simply to reduce the required contract size (while not compromising usability though)

Map

Coming back to this: every type must provide a hashable representation of itself, I see a problem here. Some values, for example a Go function implementing value interface or the wrapper type for arbitrary Go values (Any) have nothing that they can use to create hashable value. Initially I was thinking may be we could use memory address of the value or something extracted by reflect as last resort. But after some trial-and-errors I don't see anything that we can use here. Let me know if you have alternate approaches here. (While I really really want to make all values hashable, I definitely don't want to force users to implement Value types manually whenever they want to use Sabre as just a scripting layer by exposing already existing Go values - this is one of the usecases I built parens for in the first place.)

type Map interface {
  Value

  // A single:
  Seq() Seq // sequence of key-value pairs. (slightly implicit)
  // OR following 2 methods:
  Keys() Seq
  Vals() Seq

  Get(key Value) Value      // or EntryAt(). return error? default arg? 
  HasKey(key Value) bool   // return error?
  Assoc(key, val Value) (Map, error)
  Dissoc(key Value) (Map, error)
}

Set

type Set interface {
  Value

  Seq() Seq
  HasKey(key Value) bool
  Conj(items ...Value) (Set, error) 
  Disj(item Value) (Set, error)
}

These interafaces I think are concise while providing enough to build on. Let me know if I have missed any cases or you have other thoughts. :)

lthibault commented 4 years ago

Well, I knew the difference 😏

Oh sorry! I really didn't mean to imply that you didn't!

I was merely going through the exercise of contrasting list/vector in order to set the stage for discussion, but I see how my previous comment might have sounded a bit condescending ... 😬

So may be we are better off using Seq as the expected type wherever we are expecting a List (like Clojure does with ISeq)?

Yes, I meant to ask about Seq (and then forgot) -- thanks for clearing this up. I prefer your approach, and agree with all sub-points. ✔️

Coming back to this: every type must provide a hashable representation of itself, I see a problem

Hmm this is a good point. I also agree that we should relax this constraint if it means sacrificing on the ease of Go-Sabre interop.

Initially I was thinking may be we could use memory address of the value or something extracted by reflect as last resort. But after some trial-and-errors I don't see anything that we can use here.

It seems we can hash reflect.Value - would that not work?

List/Vector/Map/Set

I prefer the clojure-like method names in your proposal.
👍 for Sequable

Map

type Map interface {
  // ...

  /*
    @spy16:  Can we have both?  I like the idea of having Map provide Sequable,
                     and I also like having access to just the keys or just the value.

                     This is comparable to dicts in Python, which have a `keys`, `values`
                     and `items` method -- very convenient!
  */
  Seq() Seq // sequence of key-value pairs. (slightly implicit)
  Keys() Seq
  Vals() Seq

  // ... 
}

Cheers 👋

spy16 commented 4 years ago

I really didn't mean to imply that you didn't!

Oh don't worry about it. I knew that's not what you implied. :smile:

It seems we can hash reflect.Value -

Need to check this out. :thinking:

And yes, we could have both Seq() and Keys()+Vals() as well. (I was just concerned about edge cases here. Seq needs to return sequence of pairs and we can't ensure that here. Let me know if you think this is worth the risk of implicit contract).

This is the last of my questions and will probably close this PR since the thread has become really long and hard to follow :sweat_smile:. will raise a separate PR with runtime changes.

lthibault commented 4 years ago

Let me know if you think this is worth the risk of implicit contract

I think it's fine. I often see docstrings in the standard library that place additional constraints on typed values (io.Reader comes readily to mind).

This is the last of my questions and will probably close this PR since the thread has become really long and hard to follow 😅

No further questions on my end as well. This design is turning out nicely! 😃

What do you think are the next steps?

spy16 commented 4 years ago

Indeed, it is turning out nicely. Parallely I've been making changes as per the discussion here. Will soon send a PR. I'll close this and any further discussion we can do on the other PR.

spy16 / sabre

Re-organize single sabre package into core, reader and sabre #25

Design notes

1. Value interface

2. Core utilities

3. Value.Source() should be replaced with fmt.Stringer

4. Set should be implemented with a map.

5. Container types should be interfaces.

1. Value Interface

2. Core Utilities

3. Value.Source() should be replaced

4. Set should be implemented with a map

5. Container types should be interfaces

Member access semantics

Regarding performance:

Regarding your proposal:

Hash()

Misc.

Re: package organization

3. `Value.Source()` should be replaced with `fmt.Stringer`

4. `Set` should be implemented with a map.

3. `Value.Source()` should be replaced

`Hash()`