urbit / urbit.org

The source for urbit.org
https://urbit.org
MIT License
91 stars 197 forks source link

High level overview of Hoon #941

Closed g-a-v-i-n closed 2 years ago

g-a-v-i-n commented 4 years ago

The https://urbit.org/docs/tutorials/hoon/ page sort of jumps right in and assumes I'm already convinced I want to learn the language. I think it would be helpful to provide extra context about the language first:

I think something like this would also be helpful to have as part of the concepts section.

belisarius222 commented 4 years ago

Here's my take on those questions, which I think are good questions that we don't answer explicitly enough.

What is special about Hoon?

It's a purely functional systems language.  Calling it a functional analog of C is not too far off in several ways.  Almost all code throughout Urbit's kernelspace and userspace is written in Hoon.

What properties does Hoon have?  What type of language is it?

Hoon is a statically typed, purely functional, strictly evaluated programming language.

Hoon and Nock have several unusual properties:

What can Hoon do that other languages can't?

The short answer is: implement a purely functional operating system.  Try to do this in a principled way in Haskell, and the problems you'll run into will make design decisions in Hoon and Nock make a lot more sense.

In particular, the problems Hoon solves well that aren't solved by other functional languages are:

What is Hoon good at?

Hoon is mostly good at compiling and running other Hoon code.  Urbit consists of many layers of bootstrapping, and several of them lean heavily on this feature, including the Gall application runner, the Ford build system, the Dojo shell, and the Arvo kernel itself.  Even Urbit's chat application lets you run Hoon expressions and share the results with your friends.

Why did we write the OS in Hoon?

The chain of reasoning goes something like this:

Software complexity leads to monopolies and lack of individual digital sovereignty, in addition to bugs and security vulnerabilities.  One of the best ways to reduce software complexity is to restrict oneself to pure mathematical functions — no side effects.  This makes the system deterministic.  So we want a deterministic, functional operating system for individuals to run.

This operating system should also be axiomatic; we don't want it to depend on various idiosyncrasies of whatever the current hardware is like.  Hardware changes over time, and we want people to be able to pass their computers on to their grandchildren.  So we should have a virtual machine that runs this functional operating system.

As hardware changes, people need to move their virtual machines to new hosts.  This means there needs to be a standard way to serialize and deserialize the VM state at any time.  The easiest way to do this is by storing an event log, just like a database, and writing each event to that before emitting effects caused by it.  Since we also need state snapshots, every piece of data in the system, including runtime state, needs to be serializable to a standardized format.

Since Urbit is an operating system, its main purpose is to load and run programs on the behalf of the user.  This means the system needs to be really good at hot code reloading, metaprogramming, and virtualization / self-hosting / metacircularity.

There aren't any other languages out there that are purely functional, purely axiomatic, performant enough for practical personal use, universally serializable, and good at runtime metaprogramming.  Nock is Urbit's solution to these design constraints.  Some Lisps come close to meeting these criteria — and Nock is very Lisp-like — but no practical Lisp dialects are nearly as pure or axiomatic as Nock.

Why is Hoon the way it is?

Minimalism, mostly.  Even the seemingly baroque syntax is extremely regular and an unusually thin layer over the abstract syntax tree.  It's also designed as a power tool; learning the syntax takes some time, but once you know it, it's quite ergonomic and not hard to read.  It's like an English speaker learning Cyrillic.

Subject orientation, in Nock and Hoon, stems partly from minimalism (there's just one subject, which serves as state, lexical scope, environment, and function argument), partly from a desire to make dynamic compilation simpler, and partly in order to give the language a more imperative feel.  While Hoon is a purely functional language, many of the runes create a mutant copy of the subject for manipulation by future runes (similar to Forth's stack operations), which makes it feel more like an expression is "doing something" rather than just calculating something.

Urbit's principled minimalism simplifies all kinds of things at many layers of the stack; for example, Urbit's linker, which is part of the Ford build system, just conses together multiple libraries into a tuple to form the compile-time environment for a source file.  Universal serialization means we can safely send arbitrary pieces of data to apps on other ships without any more work than sending them to a local app.  Using an app called Aquarium that's about three hundred lines of code, Arvo can run a whole fleet of other Arvos inside itself at full speed, just like Docker, and perform a test suite.

— ~rovnys-ricfer https://urbit.org

On Tue, Oct 22, 2019 at 6:41 PM, g-a-v-i-n < notifications@github.com > wrote:

The https:/ / urbit. org/ docs/ tutorials/ hoon/ ( https://urbit.org/docs/tutorials/hoon/ ) page sort of jumps right in and assumes I'm already convinced I want to learn the language. I think it would be helpful to provide extra context about the language first:

  • What is special about hoon?
  • What properties does hoon have? What type of language is it?
  • What can hoon do that other languages can't?
  • What is hoon good at?
  • Why did we write the OS in hoon?
  • Why is hoon the way it is?

I think something like this would also be helpful to have as part of the concepts section.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub ( https://github.com/urbit/docs/issues/674?email_source=notifications&email_token=AAGVR5NI24W7XSOLHYVWPE3QP56P7A5CNFSM4JDZHT32YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HTUYOLA ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAGVR5M6TUAGKBSH7OXVOH3QP56P7ANCNFSM4JDZHT3Q ).

drbeefsupreme commented 4 years ago

This is fantastic, Ted. It made a lot of sense to me, and I do not have much of a software engineering background, and my primary gripe with Urbit so far has been that I did not understand what was gained by using Nock/Hoon. This clears up a lot of it for me in a way that someone of my background can understand (though there's still a handful of things I need to dig a little deeper into), which means your explanation should also reach a wider audience than professional software developers as well.

And I concur with Gavin's assessment that a document like would be extremely valuable to have in the docs.

--

~tidrel-tabpub

https://urbit.org

On Tue, Oct 22, 2019 at 8:44 PM Ted Blackman notifications@github.com wrote:

Here's my take on those questions, which I think are good questions that we don't answer explicitly enough.

What is special about Hoon?

It's a purely functional systems language. Calling it a functional analog of C is not too far off in several ways. Almost all code throughout Urbit's kernelspace and userspace is written in Hoon.

What properties does Hoon have? What type of language is it?

Hoon is a statically typed, purely functional, strictly evaluated programming language.

Hoon and Nock have several unusual properties:

  • Axiomatic. Nock is a fully axiomatic computing system — no dependencies, no builtins, no hardware dependence — just pure math. The jet system gives this a practical level of performance.

  • Layered. Nock is machine code designed for machines to run; Hoon is a programming language designed for people to use. Hoon layers over Nock in a very similar way to how C layers over machine code. Nock has no symbols; all programmer-facing variable names live in Hoon types. You can use raw Nock from Hoon, much like dropping down to assembly from within C. Hoon compiles to Nock with no "runtime system" — aside from subtree lookups, you could pretty easily compile most Hoon expressions by hand.

  • Minimal. Urbit is ruthlessly minimalistic throughout the stack. Nock has only twelve opcodes, and Hoon's semantics are a very thin layer over Nock's. The only fundamental datatype in Urbit is the noun: either an arbitrarily large natural number, called an atom, or a pair of nouns, called a cell. Lists, sets, maps, queues, executable code, closures, ASTs, and everything else in the whole programming environment is represented using just this data structure.

  • Acyclic. There are no cycles in nouns, which are just binary trees, so there are no cycles in Nock's memory model. Pointer equality is not exposed to the programmer. In practical Nock implementations, all data structures are what are elsewhere called "functional" or "persistent" data structures, meaning they're immutable and share structure wherever possible. [with the exception of "unifying equality", which some runtimes provide as an optimization]

  • Homoiconic. Code and data are represented the same way and can be converted to each other. Lisp dialects are also homoiconic, but Hoon and Nock are arguably even more so, since things like closures and the environment are just Nock trees. We even have a statically typed metacircular interpreter called +mule. We also run userspace code metacircularly with negligible performance overhead because of Urbit's jet system.

  • Universally serializable. There's one serialization format, called "jam", for any piece of code or data in the system. This makes it so that deserialization has just one function, a few hundred lines of C, as its security attack surface. It also facilitates portability of the virtual machine state, and it turns out to be useful in a bunch of places in the system.

  • Jetted. A jet is a piece of code that the Nock interpreter has that reimplements a Nock function in some other, faster language. The +dec decrement function in Hoon's standard library is defined axiomatically using recursion, but is run as a jet written in C that makes use of the processor's ALU for performance. When calling a Nock function, if the runtime has a matching jet, it will use that instead of the Nock implementation. Nock isn't as slow as you might think, especially considering it's a minimal, dynamic, axiomatic language.

  • The amount of syntax is unusually large, but also unusually regular. Hoon is a "runic" language, meaning expressions generally begin with a digraph "rune" corresponding to the type of expression. Runes are used in place of keywords and Lisp's "special forms". Hoon's syntax makes better use of screen real estate than Haskell or Lisp. It also does not allow syntactic abstraction, so you always know exactly what you're looking at when reading Hoon. The similarity of Hoon's syntax to its abstract syntax tree makes metaprogramming much easier.

  • Subject-oriented. There is no implicit environment; a Hoon expression compiles down to a Nock formula, which is interpreted as a function that runs with the "subject" as the argument. The subject can be any Nock tree, but it contains everything that's in scope. It usually consists of the Hoon compiler and standard library as a stack of cores, along with whatever functions and variables have been defined in the lexical scope.

  • Cores. A core is a pair of code and data. The core is the underlying representation of a function (a lambda with implicit fixed point), an OOP-style object, a library or module, and an "interface" or "protocol". The code consists of a set of Nock formulas that can each be run against the whole core as a subject. This means there's always an implicit fixed point when running a function, and mutual recursion can occur among the formulas in a core. A core's data usually consists of the standard library (itself a stack of cores), and possibly a "sample" (function argument).

  • The type system has several unusual features. It's intensional, in the sense that all constructs are first-class and can be down-cast to noun, Hoon's "any" or "top" type that matches all Nock nouns. Types are also where Hoon's variable names and docstrings are stored. The type system also uses an unusual macro-like feature called "wetness" to implement parametric polymorphism. Nouns can also be coerced into structural types, such as lists, The type system can also auto-generate coercion functions that validate and lift raw nouns into structural types, such as lists or cells.

  • Reflective. The type of type is just a normal datatype in Hoon, and a lot of the system manipulates types. In particular, the !> rune, when applied to a piece of data, uses compile-time type reflection to produce something called a "vase": a pair of type and data, similar to a Data.Dynamic in Haskell, or a limited form of a dependent pair. Since the Arvo kernel does a lot of dynamic compilation, it uses vases to implement something akin to a dynamically typed language using Hoon.

  • Inert. Because Nock is purely functional, Hoon compiles to it so directly, everything is homoiconic, and Hoon is intensional, there's a very nice feeling that everything is just a stationary tree. There are no special objects that can't be manipulated; everything in your environment is just a subtree, and you could grab it and print it out if you wanted to. There's nothing like a "database handle", "websocket connection object", or other mystical constructs. The calmness of working with such inert building blocks is addictive, as many Hoon programmers will attest.

What can Hoon do that other languages can't?

The short answer is: implement a purely functional operating system. Try to do this in a principled way in Haskell, and the problems you'll run into will make design decisions in Hoon and Nock make a lot more sense.

In particular, the problems Hoon solves well that aren't solved by other functional languages are:

  • Compile and run other code in a typesafe manner at full speed.

  • Typesafe metaprogramming.

  • Hot code reload and online data migration.

What is Hoon good at?

Hoon is mostly good at compiling and running other Hoon code. Urbit consists of many layers of bootstrapping, and several of them lean heavily on this feature, including the Gall application runner, the Ford build system, the Dojo shell, and the Arvo kernel itself. Even Urbit's chat application lets you run Hoon expressions and share the results with your friends.

Why did we write the OS in Hoon?

The chain of reasoning goes something like this:

Software complexity leads to monopolies and lack of individual digital sovereignty, in addition to bugs and security vulnerabilities. One of the best ways to reduce software complexity is to restrict oneself to pure mathematical functions — no side effects. This makes the system deterministic. So we want a deterministic, functional operating system for individuals to run.

This operating system should also be axiomatic; we don't want it to depend on various idiosyncrasies of whatever the current hardware is like. Hardware changes over time, and we want people to be able to pass their computers on to their grandchildren. So we should have a virtual machine that runs this functional operating system.

As hardware changes, people need to move their virtual machines to new hosts. This means there needs to be a standard way to serialize and deserialize the VM state at any time. The easiest way to do this is by storing an event log, just like a database, and writing each event to that before emitting effects caused by it. Since we also need state snapshots, every piece of data in the system, including runtime state, needs to be serializable to a standardized format.

Since Urbit is an operating system, its main purpose is to load and run programs on the behalf of the user. This means the system needs to be really good at hot code reloading, metaprogramming, and virtualization / self-hosting / metacircularity.

There aren't any other languages out there that are purely functional, purely axiomatic, performant enough for practical personal use, universally serializable, and good at runtime metaprogramming. Nock is Urbit's solution to these design constraints. Some Lisps come close to meeting these criteria — and Nock is very Lisp-like — but no practical Lisp dialects are nearly as pure or axiomatic as Nock.

Why is Hoon the way it is?

Minimalism, mostly. Even the seemingly baroque syntax is extremely regular and an unusually thin layer over the abstract syntax tree. It's also designed as a power tool; learning the syntax takes some time, but once you know it, it's quite ergonomic and not hard to read. It's like an English speaker learning Cyrillic.

Subject orientation, in Nock and Hoon, stems partly from minimalism (there's just one subject, which serves as state, lexical scope, environment, and function argument), partly from a desire to make dynamic compilation simpler, and partly in order to give the language a more imperative feel. While Hoon is a purely functional language, many of the runes create a mutant copy of the subject for manipulation by future runes (similar to Forth's stack operations), which makes it feel more like an expression is "doing something" rather than just calculating something.

Urbit's principled minimalism simplifies all kinds of things at many layers of the stack; for example, Urbit's linker, which is part of the Ford build system, just conses together multiple libraries into a tuple to form the compile-time environment for a source file. Universal serialization means we can safely send arbitrary pieces of data to apps on other ships without any more work than sending them to a local app. Using an app called Aquarium that's about three hundred lines of code, Arvo can run a whole fleet of other Arvos inside itself at full speed, just like Docker, and perform a test suite.

— ~rovnys-ricfer https://urbit.org

On Tue, Oct 22, 2019 at 6:41 PM, g-a-v-i-n < notifications@github.com > wrote:

The https:/ / urbit. org/ docs/ tutorials/ hoon/ ( https://urbit.org/docs/tutorials/hoon/ ) page sort of jumps right in and assumes I'm already convinced I want to learn the language. I think it would be helpful to provide extra context about the language first:

  • What is special about hoon?
  • What properties does hoon have? What type of language is it?
  • What can hoon do that other languages can't?
  • What is hoon good at?
  • Why did we write the OS in hoon?
  • Why is hoon the way it is?

I think something like this would also be helpful to have as part of the concepts section.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub (

https://github.com/urbit/docs/issues/674?email_source=notifications&email_token=AAGVR5NI24W7XSOLHYVWPE3QP56P7A5CNFSM4JDZHT32YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HTUYOLA ) , or unsubscribe (

https://github.com/notifications/unsubscribe-auth/AAGVR5M6TUAGKBSH7OXVOH3QP56P7ANCNFSM4JDZHT3Q ).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/urbit/docs/issues/674?email_source=notifications&email_token=AMHVOY643OUNZUXK52MUVNDQP6M77A5CNFSM4JDZHT32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB7U7OA#issuecomment-545214392, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMHVOY23WUT4M3B6PMMMBKLQP6M77ANCNFSM4JDZHT3Q .

tacryt-socryp commented 4 years ago

Ted, this is excellent. I’d love to see more communication of the core rationales of this system put out to the public in a similar manner. (Azimuth, Ames, the Arvo lifecycle function are key examples)

g-a-v-i-n commented 4 years ago

@belisarius222 this is so great

rmariani commented 4 years ago

https://urbit.org/blog/why-hoon/