Closed lthibault closed 3 years ago
Oh wetware sounds amazing!π₯ I am really happy sabre has turned out to be useful in such a cool project.
Most of the concerns you have mentioned are very valid and actually aligned with the concerns I have had myself.
I'll get back to you over this weekend and we can figure out the next steps here based on this great report and the issues I have been facing with the new runtime model.
Oh wetware sounds amazing!π₯ I am really happy sabre has turned out to be useful in such a cool project.
That's great to hear! π I'll be sure to keep you in the loop!
Most of the concerns you have mentioned are very valid and actually aligned with the concerns I have had myself.
Brilliant - synchronization achieved!
I'll get back to you over this weekend and we can figure out the next steps here based on this great report and the issues I have been facing with the new runtime model.
Sounds like a plan. Looking forward to it!
Some of the issues I have been seeing with runtime
model and the previous model:
Value
type themselves by having Eval()
method. Which seemed to provide lot of flexibility but actually compromised simplicity (e.g., even those values which didn't have evaluation required implementing Eval()
) and also extendibility. (e.g., I tried to add max-stack-depth as a safety feature for a sandboxed environment, but because eval and invocation logic were in List.Eval(), it meant that the runtime implementations needed to expose stack manipulation functions as well).At a high level, the problem turned out to be too many moving parts due to interfaces. The approach I now have in mind combines ideas from Clojure, Joker, zygomys and to some extent Rob Pike's lisp. I have setup an experimental repository Zulu which implements these ideas (Just created a separate repository to avoid any confusion, but will be moved to Sabre once finalised - on the other hand, I kinda like the name π ).
Highlights:
At the core of Sabre will be following 2 types:
// name may change to Context or Env etc. based on technicalities
type VM struct{...} // notice that this a struct type.
func (vm *VM) Eval(form Any) (Any, error) {
expr, _ := vm.expandAnalyze(form)
return expr.Eval(vm)
}
type Expr interface {
eval(vm *VM) (Any, error)
}
VM.Eval()
takes any form and first converts it to an Expr
type using macro-expander + analyser (Either builtin, or custom or a composed version that has both).
type Expander interface {
Expand(vm *VM, form value.Any) (value.Any, error)
}
type Analyzer interface {
Analyze(form value.Any) (Expr, error)
}
Expr
implementations will be pre-defined which will include ConstExpr
, DefExpr
, IfExpr
etc. Custom Expr
may be implemented by composing these. (See ext.VectorExpr
for example)Pros:
GoExpr
can be added to implement goroutines support (Need to think about this more. But i think it should be doable easily this way)Cons:
Value
type and Any
is an interface{}
type. (Although, I don't think this is exactly a con since with Value
type model, all Go values had to be converted to a matching Value type by reflection. This is entirely avoided)Do take a look at Zulu and let me know your thoughts.
Addressing Issues in the Report:
Just created a separate repository to avoid any confusion, but will be moved to Sabre once finalised - on the other hand, I kinda like the name π
Personally, I thought Parens was the best name. I was sad to see it go! π
Anyway, I re-read the relevant chapters of SICP over the weekend and then had a look at Zulu on Sunday evening. In a word: π π π
In fact, you forgot one of the more significant "pros" in your list: a major gain in efficiency, stemming from the fact that we only perform syntactic analysis once for each expression! As you've noticed in previous threads, I'm always a bit performance-conscious, so this is a big win in my book.
My only major thoughts/concerns are as follows:
GoExpr can be added to implement goroutines support (Need to think about this more. But i think it should be doable easily this way)
I noticed that VM
is not thread-safe. What are the implications for GoExpr
and https://github.com/spy16/sabre/issues/15?
In terms of design, I see two possibilities:
I'd prefer to avoid global locking, since it undermines much of the power of Go's M:N thread model.
Similar to deriving context.Context
s, could we somehow "derive" a new VM? The procedure would look something like:
VM
I haven't thought through all the implications, so maybe this approach is flawed. In any case, this is by far my biggest concern right now.
https://github.com/spy16/zulu/blob/master/zulu.go#L63 https://github.com/spy16/zulu/blob/master/zulu.go#L70
I noticed that VM.Eval
uses value.Nil{}
directly. What happens if I want to use my own implementation of Nil?
Wetware currently aliases runtime.Nil
, and I don't necessarily expect this to change. However, I would be a bit more comfortable if atoms were fully swappable.
Some parts remain concrete (i.e., VM) and reduces the permutations possible with different implementations and hence all the edge cases. Also leaves room for some optimisations (I think).
Although not a huge priority right now, I'm curious as to what optimizations you have in mind.
Addressing Issues in the Report: [...]
π on all points.
Regarding the GoExpr, I do not have any concrete ideas at the moment to be honest. But I have the same high level idea in mind as you. i.e., to copy the VM instance and launch the goroutine with that as the context. (That's why not sure if it would be more appropriate to rename VM to Context perhaps - but I do like VM π ).
For not depending on value.Nil
I guess a way would be to return native nil and expect the user to handle.
On the name, we could go and do the refactor on parens and archive sabre. But this needs to be the last time we do something like this. π€£
I have the same high level idea in mind as you. i.e., to copy the VM instance and launch the goroutine with that as the context.
π SGTM.
That's why not sure if it would be more appropriate to rename VM to Context perhaps - but I do like VM π
Yeah, I do too. It gives me some idea of what it's actually doing, under the hood. In particular, I can infer that it's probably some sort of stack machine. In general, I don't like the word Context
because it doesn't really tell us anything ... it's like Data
in that respect. Everything is data, and everything is context.
OTOH, I don't expect there to be more than one VM in a given process... π€.
SICP would call this an "Evaluator". The -or suffix suggests an interface, but in this case I think it's still okay.
One thing we could do to resolve both the naming convention and the concurrency question is to factor the stack out of the VM, making it stateless. Then, each goroutine would be responsible for passing its stack to Eval
explicitly. The trade-off is that Analyzer
and Expander
must now be stateless, or at the very least thread-safe, but I don't think that's actually a problem.
To illustrate, we might do something like this:
// Execution context for an Expr. Must be exported since it's part of VM.Eval's call signature.
//
// N.B.: consider making this an interface, in case users want to supply their
// own implementation, which might be optimized for a specific use-case.
// e.g: maybe I want an immutable stack based on a linked-list. Or maybe
// I want to bind a `context.Context`, or something.
type Context struct {
stack []stackFrame
maxDepth int
}
func (s Stack) push(f stackFrame) { ... }
func (s Stack) pop() stackFrame { ... }
type Expr interface {
Eval(Context) (value.Any, error)
}
// only one of these per process
type VM struct{
analyzer Analyzer
expander Expander
}
func (vm *VM) Eval(c Context, form value.Any) (value.Any, error) { ... }
The areas of responsibility are delimited as follows:
VM
analyzes syntax to produce an expression, and evaluates the expression.Context
encapsulates execution state.For not depending on value.Nil I guess a way would be to return native nil and expect the user to handle.
I kind of like this. It keeps the two layers of abstraction (form evaluation vs datatype) clearly separated. I'm inclined to say it's worth the extra if v == nil
check.
On the name, we could go and do the refactor on parens and archive sabre. But this needs to be the last time we do something like this. π€£
Haha I'm in favor. We can always claim it's to maintain backwards compatibility in Sabre π (But yes, let's make it the last time!)
Another thought just popped into my head: how would you feel about moving the parens
repo to its own organization? Having an org signals a couple of positive things for a project:
I think it would be a good move, but totally understand if you'd prefer to keep it under spy16/.
Both names parens
and sabre
are taken π I wouldn't mind moving it to its own organisation if we come up with a better name that is available.
How about we just call the org go-parens
? The full repo would then be github.com/go-parens/parens
.
Else, does e.g. the Hindi word for parentheses have a nice ring to it? We could always use its phonetic spelling in the latin alphabet.
I think for now, we can keep it under spy16/parens
... (Not a solid reasoning and is more of a feeling i guess. But, I see it as a small library that does one small thing very nicely. With a dedicated org, I feel it becomes this big project that needs to have lot of features π
)
If this works for you, i will delete the git history of the current parens
repo and bootstrap it (I know that's not ideal way to do it, but technically the project is entirely different - we could start with a new repo as well - not sure what would be better though). I will also create small and independent issues on different tasks that need to be done (an issue on the VM itself, an issue on reader, an issue on analyzer etc.)
I think for now, we can keep it under
spy16/parens
Sure, makes perfect sense π
If this works for you, i will delete the git history of the current parens repo and bootstrap it
Sounds good! (And no worries -- I'm an occasional, clandestine user of git push -f
, so I totally get it π )
I will also create small and independent issues on different tasks that need to be done (an issue on the VM itself, an issue on reader, an issue on analyzer etc.)
π BTW, I'm keen to tackle the concurrency issue we discussed yesterday once the fundamentals are in place. I'm starting to get a pretty good picture of how a stateless VM could work.
Hi @spy16, I hope you're well and that things are returning to normal on your end.
As mentioned over in https://github.com/spy16/sabre/pull/26 and https://github.com/spy16/sabre/pull/27, I've been making heavy use of Sabre over the past few weeks in the context of Wetware, so I thought I'd share my thoughts on what works and what can be improved.
I'm well aware that the current
runtime
design has shown some limitations, and comfortable with the fact that parts of Wetware will have to be rewritten once we iron out the creases. My goal in publishing this is to:This report is structured as follows:
Context
Wetware is a distributed programming language for the cloud. Think Mesos + Lisp. Or Kubernetes + Lisp, if you prefer.
Wetware abstracts your datacenters and cloud environments into a single virtual cloud, and provides you with a simple yet powerful API for building fault-tolerant and elastic systems at scale.
It achieves its goals by layering three core technologies:
1. A Cloud Management Protocol
At its core, Wetware is powered by a simple peer-to-peer protocol that allows hosts to discover each other over the network, and assembles them into a fully-featured virtual cloud.
This virtual cloud is self-healing (i.e. antifragile), truly distributed (with no single point of failure), and comes with out-of-the box support for essential cloud services, including:
Wetware's Cloud Management Protocol works out-of-the-box, requires zero configuration, and features first-class support for hybrid and multicloud architectures.
2. A Distributed Data Plane
Unifying data across applications is a major challenge for current cloud architectures. Developers have to deal with dozens (sometimes even hundreds) of independent applications, each producing, encoding and serializing data in its own way. In traditional clouds, ETL and other data operations are time-consuming, error-prone and often require specialized stacks.
Wetware solves this problem by providing
With Wetware's dataplane, you can coordinate millions of concurrent processes to work on terabyte-sized maps, sets, lists, etc. These immutable and wire-native datastructures protect you from concurrency bugs while avoiding overhead due to (de)serialization.
Lastly, Wetware's location-aware caching means you're always fetching data from the nearest source, avoiding egress costs in hybrid and multicloud environments.
3. A Dynamic Programming Language
The Wetware REPL is the primary means through which users interact with their virtual cloud, and the applications running on top of it. Unsurprisingly, this REPL is a Lisp dialect built with Sabre.
Let's walk through a few examples.
We can simulate a datacenter from the comfort of our laptop by starting any number of Wetware host processes:
Next, we start the Wetware REPL and instruct it to dial into the cluster we created above.
We're greeted with an interactive shell that looks like this:
From here, we can list the hosts in the cloud. If new hosts appear, or if existing hosts fail, these changes to the cluster will be reflected in subsequent calls to
ls
.The
ls
command returns acore.Vector
, which contains a special, Wetware-specific data type:core.Path
. These paths point to special locations calledww.Anchor
. Anchors are cluster-wide, shared-memory locations. Any Wetware process can read or write to an Anchor, and the Wetware language provides synchronization primitives to deal with the hazards of concurrency and shared memory.Anchors are organized hierarchically. The root anchor
/
represents the whole cluster, and its children represent physical hosts. Children of hosts are created dynamically upon access, and can contain any Wetware datatype.Why did this print
nil
? Because the form(print (/SV4e8.../foo))
was executed on the remote hostcie5uM...
! That is, the following things happened:cie5uM...
was opened.print
function call was sent over the wire.cie5uM...
received the list and evaluated it.cie5uM...
fetched the value from theSv4e8.../foo
Anchor and printed it.If we were to check
cie5uM...
's logs, we would see the corresponding output.This concludes general introduction to Wetware.
While Wetware is very much in a pre-alpha stage, the foundational code for features 1 - 3 are in place, and the overall design has been validated. Now that we are leaving the proof-of-concept stage, developing the language (and its standard library) will be the focus of the next few months. For this reason, Sabre will continue play a central role in near-term development and I expect to split my development time roughly equally between Wetware and Sabre. As such, I'm hoping the following feedback can serve as a synchronization point between us, and motivate the next few PRs.
The Good Parts
(N.B.: I am exclusively developing on the
reader
branch, which is itself a branch ofruntime
.)Overall, Sabre succeeds in its mission to be an "80% Lisp". The pieces fit together quite well, and most things are easily configurable. This last bit is particularly true of the
runtime
branch where I was able to write custom implementations for each atom/collection, as well as create some new, specialized datatypes. I have not encountered any fundamental design flaws, which is great!!The REPL is a breeze to use, requring little effort to set up and configure. This is in large part thanks to your decision to make
REPL
(andReader
for that matter) concretestruct
s that hold interfaces internally, as opposed to declaring them as interface types. Doing so allows us to inject dependencies via functional arguments rather than re-writing a whole new implementation just to make minor changes to behavior. The result is a REPL that took me less time to set up than to write this paragraph, so this is a pattern we should definitely continue to exploit.Relatedly, I think these few lines of code really showcase the ergonomics of functional options. They compose well, are discoverable & extensible, and visually cue the reader to the fact that the
repl.New
constructor is holding everything in the package together. I'm disproportionately pleased with the outcome.Lastly, the built-in datatypes are very useful when developing one's own language because they serve as simple stubs until custom datastructures have been developed. In practice, this means I was able to develop other parts of the language in spite of the fact that e.g. Vectors had not yet been implemented in Wetware. It's hard to overstate not only how incredibly useful this is, and how much of that usefulness stems from the fact that Sabre is using native Go datastructures under the hood. Designing one's own language is quite hard, so every ounce of simplicity and familiarity is a godsend. I am strongly in favor of maintaining the existing implementations and not adding persistent datatypes for this reason. An exception might be made for
LinkedList
since the current implementation is dead-simple and shoe-horning a linked-list into a[]runtime.Value
is a bit ... backwards. In any case, Sabre really came through for me, here.Pain Points, Papercuts & Suggestions
I want to stress that this section is longer than its predecessor not because there are more downsides than upsides in Sabre, but because there's always more to say about problems than non-problems! With that said, I've sorted the pain-points I've encountered into a few broad buckets:
Error Handling
By far the biggest issue I encountered was the handling of errors inside datastructure methods. Throughout our design discussion in #25, our thinking was (understandably) anchored to the existing implementations for
Map
,Vector
, etc. Specifically, we assumed that certain operations (e.g.Count() int
) could not result in errors. This turns out to have been an incorrect assumption.As mentioned in the Context section above, Wetware's core datastructures are generated from a Cap'n Proto schema. As such, simple things such as calling an accessor function often return errors, including for methods like
core.Vector.Count()
. The result is that my code is quite panicky:Count
,Conj
,First
andNext
all panic.While there are (quite convoluted) ways of avoiding these panics, I think there's a strong argument for changing the method signatures to return errors. Sabre is intended as a general-purpose build-your-own-lisp toolkit, and predicting what users will do with it is nigh impossible. For example, they may write datastructures implemented by SQL tables, which make RPC calls, or which interact with all manner of exotic code. As such, I think we should take the most general approach, which means returning errors almost everywhere.
Design of Container Types
This issue is pretty straightforward. I'd like to implement an analog to Clojure's
conj
that works on arbitrary containers. Currently,runtime.Vector.Conj
returns aVector
, so I'm wondering how this might work. Do you think it's best to resort to reflection in such cases? Might it not be better to returnruntime.Value
from allConj
methods?Reader Design
Despite being generally well-designed, there is room for improvement in
reader.Reader
.Firstly, https://github.com/spy16/sabre/pull/27 adds the ability to modify the table of predefined symbols, which was essential in my case as I have custom implementations for
Nil
andBool
.Secondly, relying on
Reader.Container
to build containers is not appropriate for all situations. TheContainer
method reads a stream of values into a[]runtime.Value
, and returns it for further processing. In the case of Wetware'score.Vector
, this is quite inefficient since:[]runtime.Value
.[]runtime.Value
, causing additional allocs, but I can't predict the size of the container ahead of time.[]runtime.Value
is instantiated, I have to loop through it and callcore.VectorBuilder.Conj
, which also allocates.In order to avoid the penalty of double-allocation, I wrote readContainerStream, which applies a function to each value as it is decoded by the reader. The performance improvement is significant for large vectors, so I think we should add it as a public method to
reader.Reader
.Thirdly, Wetware's reliance on Cap'n Proto means that I must implement custom numeric types. To make matters more complicated, I would like to add additional numeric types analogous to Go's
big.Int
,big.Float
, andbig.Rat
. As such, I will need the ability to configure the reader's parsing logic for numerical values.Currently, numerical parsing is hard-coded into the
Reader
. I suggest adding a reader option calledWithNumReader
(or perhapsWithNumMacro
?) that allows users to configure this bit of logic. I expect this will also have repercussions onsabre.ValueOf
, but it should be noted that this function is already outdated with respect to the new runtime datastructure interfaces.Miscellanea
Lastly, a few notes/questions that are on my mind, but not particularly urgent:
Position
type seems very useful, but I'm not sure how it's meant to be used. Who is responsible for keeping it up-to-date, exactly? Any "use it this way" notes you might have would be helpful.GoFunc
,Fn
andMultiFn
. Best I can figure,GoFunc
is used to call a native Go function from Sabre, while (Multi
)Fn
is meant to be dynamically instantiated bydefn
? From there, I assumeMultiFn
is used for multi-aritydefn
forms? (I think I might have answered my own question :smile:)Conclusion
I hope you find it as useful to read this experience report as I have found it useful to write. I'm eager to discuss all of this at your earliest convenience, and standing by to help with implementation! :slightly_smiling_face: