ipfs / apps

Coordinating writing apps on top of ipfs, and their concerns.
60 stars 9 forks source link

Morte #6

Open davidar opened 9 years ago

davidar commented 9 years ago

From comments by @Gabriel439 on ipfs/ipfs#47

This is the use case for morte and annah, providing a high quality intermediate language for distributing and storing code fragments:

  • morte provides the strongly normalizing core language that is 100% pure with no ways to cheat/coerce/subvert the type system
  • annah translates recursive data types to non-recursive equivalents encoded in morte

morte is basically production ready. annah is still very experimental. The desugaring of recursive types is complete but decisions like how to encode strings and numbers are still in flux so it's not really ready for prime time just yet.

Using annah is not strictly necessary, it's purely syntactic sugar for morte code; I can step you through how to encode all your relevant datatypes and functions by hand without annah for your specific use case if you can explain exactly what you want the code to be able to do.

Relevant resources:

I highly suspect that the compressibility of Morte expressions will make or break its suitability for this purpose because right now the data type encodings have very high constant factors over more specialized encodings. I'm going to try running some basic compression algorithms over the serialized expressions and if that does not work then I will try more specialized encodings. In the worst case scenario I could just fork the core language for IPFS to provide built-in support for efficient data types.

I'm creating this issue to help coordinate integrating morte with IPFS. Personally I think a language like morte would be a great fit for embedding computable data in IPFS.

@Gabriel439 Playing devil's advocate, I have a few questions: How does this improve on distributing regular (Safe)Haskell or pure Lisp source code (or BLC) over the internet? What limitations does the lack of recursion have in practice, and is this outweighed by the guarantees it allows you to make about the code? What suggestions would you make in terms of being able to use code that requires unbounded recursion?

CC: @jbenet @ion1

ion1 commented 9 years ago

As for code that requires unbounded recursion, Morte code is able to generate any code. I’d be perfectly happy with a guaranteed-safe ipfs command that gives me an unsafe executable I can choose to run.

That doesn’t necessarily complicate the programmer’s life. For example, Haskell’s IO is an EDSL that looks like imperative code but actually retains referential transparency. The evaluation of an IO expression has no side effects and generates an unsafe program you can choose to execute.

Gabriella439 commented 9 years ago

How does this improve on distributing regular (Safe)Haskell or pure Lisp source code (or BLC) over the internet?

The unique features of morte are:

The main disadvantages of morte is:

This makes it significantly more difficult to program directly in morte. It's more appropriate for a low-level intermediate language than a user-facing language. annah somewhat alleviates the second and third problems but the first problem remains because I still haven't written a type inference engine, yet.

What limitations does the lack of recursion have in practice, and is this outweighed by the guarantees it allows you to make about the code?

Much fewer limitations than you think! The key is that recursive data structures and functions can be mechanically translated into non-recursive equivalents. There's an excellent paper which describes the algorithm behind this and the original post on morte gives an example of translating the recursive list type and list operations to their non-recursive analogs. The annah compiler automates the translation outlined in that paper.

What's really neat is that the static linking property I mentioned holds even in the presence of "recursive" data types. For example, if I have something like the following mutually recursive Haskell data types:

data Even = Zero | SuccE Odd

data Odd = SuccO Event

... annah knows how to translate the Even and Odd types and all of their constructors to non-recursive encodings that have no interdependencies between each other. In other words, the SuccE constructors' encoding does not refer to or depend on the Even/Odd types or any of the other constructors. This post goes into more detail about how static linking works in morte.

Note that just because it is possible to transform recursive programs into non-recursive versions, doesn't mean that it is easy. In fact, it's actually a bit difficult for more complex algorithms because you have to handle every single possible corner case. If you want to see some really hairy example code, I have an implementation of binary numeral arithmetic in this directory and the implementation of (+) for binary-encoded numerals is really long. In fact, that's actually the annah code! The generated morte code is even longer and harder to follow. Totality forces you to get a lot of things right and morte doesn't let you opt out of totality. For example, in the encoding of binary numeral arithmetic I had to handle all sorts of corner cases that I would have never even realized existed otherwise. (Side note: if you use peano numerals then the implementation is dead simple, in comparison, but much slower to compile for large numbers).

The most important guarantee that the lack of recursion gives is that it's always safe to normalize expressions (i.e. you can safely inline everything) because the inlining process is guaranteed to terminate. This is why morte gives better compile-time optimizations. If you try do this in a non-total language you run the risk of the compiler hanging indefinitely.

To make this concrete, here is an example of something that morte will optimize away at compile time:

$ # First, some preliminary "imports" to make the code less verbose
$
$ # There is a Prelude of `morte` expressions which you can browse here:
$ # http://sigil.place/talk/0/
$
$ echo '#http://sigil.place/talk/0/Bool' > Bool
$ echo '#http://sigil.place/talk/0/Bool/False' > False
$ echo '#http://sigil.place/talk/0/List' > List
$ echo '#http://sigil.place/talk/0/List/Cons' > Cons
$ echo '#http://sigil.place/talk/0/Bool/and' > and
$
$ # This optimizes `\bs -> and (False:bs)` to `\bs -> False`
$
$ echo '\(bs : #List #Bool ) -> #and (#Cons #Bool #False bs)' | morte
∀(bs : ∀(List : *) → ∀(Cons : ∀(head : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) → ∀(tail : List) → List) → ∀(Nil : List) → List) → ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool

λ(bs : ∀(List : *) → ∀(Cons : ∀(head : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) → ∀(tail : List) → List) → ∀(Nil : List) → List) → λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False

However, there are some things that morte will not optimize away at compile time. For example, morte will optimize True && b to b, but morte will not optimize b && True to b. The latter optimization might work in the future, but it does not, yet.

Also, remember that hugely complex implementation of (+) above? Despite the complexity, morte and annah can still perform optimizations like this:

$ echo "#http://sigil.place/talk/0/Bin/(+) 0" | annah | morte
∀(b2 : ∀(Bin : *) → ∀(Zero : Bin) → ∀(One : (∀(Bin_ : *) → ∀(Nil_ : Bin_) → ∀(Zero_ : Bin_ → Bin_) → ∀(One_ : Bin_ → Bin_) → Bin_) → Bin) → Bin) → ∀(Bin : *) → ∀(Zero : Bin) → ∀(One : (∀(Bin_ : *) → ∀(Nil_ : Bin_) → ∀(Zero_ : Bin_ → Bin_) → ∀(One_ : Bin_ → Bin_) → Bin_) → Bin) → Bin

λ(b2 : ∀(Bin : *) → ∀(Zero : Bin) → ∀(One : (∀(Bin_ : *) → ∀(Nil_ : Bin_) → ∀(Zero_ : Bin_ → Bin_) → ∀(One_ : Bin_ → Bin_) → Bin_) → Bin) → Bin) → b2

In other words, morte detected at compile time that \n -> 0 + n is the same thing as \n -> n.

I think the easiest way to answer your general concern about recursion is if you give me a simple problem that you think is difficult to encode in a non-recursive language and I actually write up a solution for it in morte. A lot of people think that they need unbounded recursion when they actually don't.

One of the goals of morte was very similar to the goal of IPFS, to have a first-class way of storing computations. For example, combining annah and morte, I can partially apply the (+) function to its first argument and store that as a computation stored within the inc file:

$ echo "#http://sigil.place/talk/0/Bin/(+) 1" | annah | morte > inc

... and then use that within a future computation:

$ echo "#inc 3" | annah | morte
∀(Bin : *) → ∀(Zero : Bin) → ∀(One : (∀(Bin_ : *) → ∀(Nil_ : Bin_) → ∀(Zero_ : Bin_ → Bin_) → ∀(One_ : Bin_ → Bin_) → Bin_) → Bin) → Bin

λ(Bin : *) → λ(Zero : Bin) → λ(One : (∀(Bin_ : *) → ∀(Nil_ : Bin_) → ∀(Zero_ : Bin_ → Bin_) → ∀(One_ : Bin_ → Bin_) → Bin_) → Bin) → One (λ(Bin_ : *) → λ(Nil_ : Bin_) → λ(Zero_ : Bin_ → Bin_) → λ(One_ : Bin_ → Bin_) → Zero_ (Zero_ Nil_))

That result is how the number 4 is encoded in morte using a binary numeral system.

davidar commented 8 years ago

@Gabriel439 Thanks for your detailed reply :) . I was hoping @jbenet would have responded by now, but he seems to be busy with things at the moment.

So, in the meantime, let's discuss getting a proof of concept running. The easiest way of doing this would probably be to get morte to be able to resolve ipfs addresses (#/ipfs/Qm.../Prelude/Bool/False), which could simply be translated to (public) gateway URLs for a start. We can then write a small script to be able to execute computable data stored on ipfs like

$ ipfs-morte /ipfs/...
Compiling...
Result: ...
Gabriella439 commented 8 years ago

I can do this without making any changes to the morte library. The part of the AST that holds remote references is polymorphic, so it can actually hold anything, including ipfs addresses.

The only thing I'm missing is how to translate ipfs references to gateway URLs. Is that documented somewhere?

davidar commented 8 years ago

@Gabriel439 You just prefix the paths with the address of the gateway server, for example:

See https://ipfs.io/docs/getting-started/ for more information.

Gabriella439 commented 8 years ago

Sorry for the slow turnaround, but I finally wrote up a simple morte-ipfs like you described:

https://github.com/Gabriel439/Haskell-Morte-Library/blob/ipfs/exec/Ipfs.hs

I was able to upload a Morte expression to a local IPFS daemon (temporarily hardcoding the gateway to localhost), but when I tried to connect to the daemon using morte-ipfs the TLS handshake failed with:

morte-ipfs: TlsExceptionHostPort (HandshakeFailed (Error_Packet_Parsing "Failed reading: invalid header type: 72\nFrom:\theader\n\n")) "127.0.0.1" 8080

The error message indicates that the server sent a TLS packet with an invalid contents type (72, which is not a registered content type). So this is either a TLS protocol parsing error by the ipfs server or the tls library. I'm not sure which yet.

jbenet commented 8 years ago

@Gabriel439

Gabriella439 commented 8 years ago

Alright, I got it working just by using http instead of https so that it didn't select TLS. I tried port 5001 for the API route but that resulted in a 403 forbidden error, but the public API on port 8080 worked for me.

What's the difference between the public and private interfaces?

jbenet commented 8 years ago

the API route but that resulted in a 403 forbidden error, but the public API on port 8080 worked for me.

is the api client setting origin/referrer headers?

cc @diasdavid @dignifiedquire @davidar thoughts? this is a Haskell api client

What's the difference between the public and private interfaces?

  • Public is meant to be exposed, ReadOnly mostly (writable part is opt-in and restricted). for example, https://ipfs.io runs over the global one we run.
  • Private gives full access, and restricts the /ipfs/... http gateway part to routes explicitly allowed (to prevent exploits).
Gabriella439 commented 8 years ago

Alright, I added command-line options to parametrize the morte-ipfs tool on the hostname and port, and here's an example of successfully building a composite expression and compiling it using morte-ipfs:

$ # Uploading `(&&)`
$ echo "λ(x : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) → λ(y : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) → x (∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) y (λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False)" | ./ipfs add
added QmaHXo2mYsdmDf2V35QHVTuUCAjKzwbtd419BWnbmCkUJB QmaHXo2mYsdmDf2V35QHVTuUCAjKzwbtd419BWnbmCkUJB

$ # Uploading `True`
$ echo "λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True" | ./ipfs add
added QmTBfdpJx6MLyxadZLRhG7LYcrmjxgCtWJGef4VxG1mo8v QmTBfdpJx6MLyxadZLRhG7LYcrmjxgCtWJGef4VxG1mo8v

$ # Uploading `False`
$ echo "λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False" | ./ipfs add
added QmTvtqwNJzNGWUVMnJJErYTKWPjhsVgeN2g9vkmtXu3R8a QmTvtqwNJzNGWUVMnJJErYTKWPjhsVgeN2g9vkmtXu3R8a

$ # Uploading "dynamically linked" `(&&) True False` expression
$ echo "
> #http://ipfs.io/ipfs/QmaHXo2mYsdmDf2V35QHVTuUCAjKzwbtd419BWnbmCkUJB
> #http://ipfs.io/ipfs/QmTBfdpJx6MLyxadZLRhG7LYcrmjxgCtWJGef4VxG1mo8v
> #http://ipfs.io/ipfs/QmTvtqwNJzNGWUVMnJJErYTKWPjhsVgeN2g9vkmtXu3R8a
> " | ./ipfs add
added QmR9XR1F15LQnqB12pcimzySkX9fWBp8Xc43RsFCoopbeZ QmR9XR1F15LQnqB12pcimzySkX9fWBp8Xc43RsFCoopbeZ

$ # Retrieve and compute `(&&) True False` from the public `ipfs.io` server
$ morte-ipfs /ipfs/QmR9XR1F15LQnqB12pcimzySkX9fWBp8Xc43RsFCoopbeZ
∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False

The morte-ipfs command defaults to the ipfs.io host and port 80, but you can configure either of them if you want to point to localhost or port 8080.

Gabriella439 commented 8 years ago

It also just occurred to me that morte-ipfs doesn't buy you that much. You can technically do this already with a little extra verbosity using the vanilla morte executable:

$ echo "#http://ipfs.io/ipfs/QmR9XR1F15LQnqB12pcimzySkX9fWBp8Xc43RsFCoopbeZ" | morte
∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool

λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False
jbenet commented 8 years ago

@Gabriel439 this is fantastic!! and very exciting!

Gabriella439 commented 8 years ago

I think the morte executable by itself already is probably good enough for read-only interaction once I figured out how ipfs addressing works.

For the write path I just need to know what APIs are available for uploading to ipfs. I can always fall back on the command line API if I need to but I wanted to know what language bindings were available.

I also would like to know how to do web-based integration. That would be the more ideal API since it doesn't require any installation from end users.

davidar commented 8 years ago

Hooray!

@Gabriel439 yes, I agree this doesn't buy us much yet, but the next step is to get it interacting directly with the local daemon (or ipfs.js eventually) instead of going via the http gateway. My (currently half-finished) https://github.com/davidar/hs-ipfs-api might help here

Gabriella439 commented 8 years ago

So if you need a Javascript binding to morte one approach is to compile morte's Haskell API via ghc-js, although I'm not sure how easy it is to invoke ghc-js code from other Javascript code since I've never tried it.

If that doesn't work, an alternative approach is just to port the morte code to Purescript which has cleaner Javascript integration. morte's actually a pretty small library (~ 600 for the core logic + an ABNF spec for the grammar) so it should be pretty straightforward to port (especially considering how similar Haskell and Purescript are).

davidar commented 8 years ago

The markup editor @jbenet linked earlier uses ghcjs

Gabriella439 commented 8 years ago

Oh sweet, then this should be much easier than I thought!

jbenet commented 8 years ago

You can technically do this already with a little extra verbosity using the vanilla morte executable:

yay unixy tools :)

jbenet commented 8 years ago

cc @cleichner

jbenet commented 8 years ago

(Tangential topic. I may break this out into own discussion.)

It would be really useful to make it easier to write ubiquitous distributed authenticated protocols (via authenticated data structures and say, js). We could do this with haskell libraries and/or on top of morte, and ghcjs. Though we could also see about doing a similar thing with OCaml, to use lambda-auth. A quick googling shows there's several ways of compiling OCaml to JS, but i've yet to look deep enough to understand the relevant security guarantees/constraints. (Or even OCaml -> Morte, if that even makes any sense?)

@amiller are you interested in this? (take a look at what @Gabriel439 has achieved with Morte).

jbenet commented 8 years ago

cc @joeyh

Kubuxu commented 7 years ago

I am rereading this thread and Morte seem like an awesome language for the job but until we try it we won't know if it is the right tool for a transformations in IPFS.

Example use cases that I can see IPFS transformation engine to perform:

There are more of them but those are once that I heard people rise interest in them.

I still don't fully understand limitations of Morte but it seems that it shouldn't be a problem and I am just making sure it is so.

Also how hard would it be to re-implement Morte's execution environment in other language (Go, JS)?

Gabriella439 commented 7 years ago

morte itself cannot perform high-efficiency operations but you can embed a syntax tree with placeholders for high-efficiency primitive operations and literals. As a very simple example, you could do:

  λ(Int : *)
→ λ(plus : Int → Int → Int)
→ λ(literal0 : Int)
→ λ(literal1 : Int)
→ plus literal0 literal1

If you do that then you're mostly using the language as a code distribution mechanism and not as an execution engine