organix / uFork

A pure-actor virtual machine with object-capabilities and memory-safety.
Apache License 2.0
73 stars 11 forks source link

UNIT #123

Closed jamesdiacono closed 1 week ago

jamesdiacono commented 2 weeks ago

The UNIT value (#unit in uFork assembly) is an attractive nuisance and should be removed.

Its inclusion in several uFork device interfaces has led to problems, namely that higher level languages such as Humus have no way to represent UNIT. There are three solutions that have been discussed:

  1. Support UNIT in high level languages targeting uFork. This can be done by extending the grammar with a new literal or extending the global namespace with a variable.
  2. Provide adapters for the device actors, mapping UNIT to some other value.
  3. Remove usage of UNIT from the device interfaces, replacing it with a less exotic value such as UNDEF.

Solution (1) requires modification to the high-level language. This is obviously not ideal. We will not always have control over the language semantics, and besides, extending a language should be a last resort.

Solution (2) introduces a maintenance burden. Even though most high level languages will require adapters to use the devices (indeed, any that lack actors or pairs), the amount of translation should be minimized where possible.

Solution (3) can be achieved by replacing occurrences of UNIT with UNDEF. That done, the question arises: what legitimate need remains for UNIT in uFork?

I propose we implement solution (3), then take it further and remove UNIT from uFork entirely. For UNDEF to subsume the current role of UNIT, we must broaden our definition of UNDEF from "the absence of a value because something went wrong" to simply "the absence of a value".

alanhkarp commented 2 weeks ago

If you adopt (3), I think you need to describe ways to distinguish "the absence of a value because something went wrong" from "the absence of a value."


Alan Karp

On Mon, Sep 30, 2024 at 10:44 AM James Diacono @.***> wrote:

The UNIT value (#unit in uFork assembly) is an attractive nuisance and should be removed.

Its inclusion in several uFork device interfaces has led to problems, namely that higher level languages such as Humus have no way to represent UNIT. There are three solutions that have been discussed:

  1. Support UNIT in high level languages targeting uFork. This can be done by extending the grammar with a new literal or extending the global namespace with a variable.
  2. Provide adapters for the device actors, mapping UNIT to some other value.
  3. Remove usage of UNIT from the device interfaces, replacing it with a less exotic value such as UNDEF.

Solution (1) requires modification to the high-level language. This is obviously not ideal. We will not always have control over the language semantics, and besides, extending a language should be a last resort.

Solution (2) introduces a maintenance burden. Even though most high level languages will require adapters to use the devices (indeed, any that lack actors or pairs), the amount of translation should be minimized where possible.

Solution (3) can be achieved by replacing occurrences of UNIT with UNDEF. That done, the question arises: what legitimate need remains for UNIT in uFork?

I propose we implement solution (3), then take it further and remove UNIT from uFork entirely. For UNDEF to subsume the current role of UNIT, we must broaden our definition of UNDEF from "the absence of a value because something went wrong" to simply "the absence of a value".

— Reply to this email directly, view it on GitHub https://github.com/organix/uFork/issues/123, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFAT4Y4X7UYDHW2EHY6EPLZZF5WZAVCNFSM6AAAAABPD3UDFOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TOMRZHEZTKOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dalnefre commented 2 weeks ago

UNDEF (#? in uFork ASM, or ? in Humus) does not imply "failure". It means there is no defined value. Sometimes this is a result of failure, and sometimes it is a normally expected result. In some "deque" and "dict" operations, #? is a normal result that may influence flow-control in an algorithm.

For example, "dict get" returns #? if there is no value associated with the given key. Also, "deque pull" returns #? if the deque is empty. Neither of these are error conditions.

For APIs that want to communicate success/failure along with a possible result, we recommend the Parseq-inspired Requestor protocol, where the result is a <return-value, error-value> pair, with #f, #?, #nil, and 0 as error-values (aka "falsy") representing success. On failure we recommend, but do not require, the return-value be #? (compatible with Parseq).

On Mon, Sep 30, 2024 at 12:08 PM Alan Karp @.***> wrote:

If you adopt (3), I think you need to describe ways to distinguish "the absence of a value because something went wrong" from "the absence of a value."


Alan Karp

On Mon, Sep 30, 2024 at 10:44 AM James Diacono @.***> wrote:

The UNIT value (#unit in uFork assembly) is an attractive nuisance and should be removed.

Its inclusion in several uFork device interfaces has led to problems, namely that higher level languages such as Humus have no way to represent UNIT. There are three solutions that have been discussed:

  1. Support UNIT in high level languages targeting uFork. This can be done by extending the grammar with a new literal or extending the global namespace with a variable.
  2. Provide adapters for the device actors, mapping UNIT to some other value.
  3. Remove usage of UNIT from the device interfaces, replacing it with a less exotic value such as UNDEF.

Solution (1) requires modification to the high-level language. This is obviously not ideal. We will not always have control over the language semantics, and besides, extending a language should be a last resort.

Solution (2) introduces a maintenance burden. Even though most high level languages will require adapters to use the devices (indeed, any that lack actors or pairs), the amount of translation should be minimized where possible.

Solution (3) can be achieved by replacing occurrences of UNIT with UNDEF. That done, the question arises: what legitimate need remains for UNIT in uFork?

I propose we implement solution (3), then take it further and remove UNIT from uFork entirely. For UNDEF to subsume the current role of UNIT, we must broaden our definition of UNDEF from "the absence of a value because something went wrong" to simply "the absence of a value".

— Reply to this email directly, view it on GitHub https://github.com/organix/uFork/issues/123, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACFAT4Y4X7UYDHW2EHY6EPLZZF5WZAVCNFSM6AAAAABPD3UDFOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TOMRZHEZTKOA>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/organix/uFork/issues/123#issuecomment-2383854763, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ4GDJQ4QY43FCITYS3BWDZZGHQDAVCNFSM6AAAAABPD3UDFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBTHA2TINZWGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jamesdiacono commented 2 weeks ago

Okay, let me rephrase and elaborate on my understanding of the semantics of UNDEF and UNIT.

Currently, UNDEF can be thought of as "the impossibility of a value". For example, performing alu add on two booleans or dict get on a missing key both produce UNDEF. The same philosophy can be applied to higher level languages targeting uFork: calling a function with arguments outside its expected domain yields UNDEF. Rather than depending on exceptions, uFork aims to be bottom preserving at both the instruction level and in pure functional application as in Humus. UNDEF is the natural bottom value.

UNIT can be thought of as "nothing to say". We currently use it only as a sync signal in the reply messages of actors implementing the service or requestor protocols. It is not used to indicate success or failure, or anything else, just that effects have been performed. UNIT has no place in pure functional scenarios because pure functions can not have effects. Assembly procedures can have effects, but they are also capable of returning any number of values. As such, they do not return UNIT because the more natural representation of "nothing to say" is simply zero return values.

So, both UNDEF and UNIT represent a lack of meaningful information. In actor reply messages we use UNIT, but in all other contexts we use UNDEF. I believe that UNIT can be replaced in all cases by UNDEF with no breakage or introduction of ambiguity. This is possible only because the distinction we have made between UNDEF and UNIT is arbitrary.

dalnefre commented 2 weeks ago

I agree with the clarification by @jamesdiacono, and have nothing to add. However, I'd like to provide some larger context.

uFork is intended to be a convenient compilation target for actor-oriented languages. We have seen that a reasonable degree of interoperability can be achieved between different languages, facilitated by translation to a common "machine" language (IR). To this end, uFork provides powerful built-in facilities that correspond to commonly-available features in many higher-level languages. However, we discovered an unexpected cost to interoperability in the UNIT value, as described in the OP. This caused us to reassess the utility and necessity of each built-in value.

There is really only one "bottom" value, which is UNDEF. This value is required as part of the API for several built-in instructions.

We also provide a distinct Boolean type, with values TRUE and FALSE. These values are the result of built-in predicates and logical operations. To support high-level languages that are less strictly-typed, the conditional treats UNDEF, FALSE, NIL, and 0 as false, and everything else as true. However, a strictly-typed high-level language can ensure that conditionals are always given Boolean values.

Ordered-pairs are the primary composite built-in data-type for uFork. As part of the API for Pairs, a unique non-Pair NIL value is provided. This value is also used to represent the empty Dictionary. The final composite Deque type uses Pairs and NIL for its representation.

There is no built-in requirement for a UNIT value.

uFork has strong support for arbitrary application-defined types. Such types can easily define a variety of type-specific singleton values. So if such a type requires a UNIT concept, it can be provided in a library and exported as part of the API for using that type.

dalnefre commented 2 weeks ago

Previously, I described how UNDEF may not necessarily indicate failure, as part of the justification to replace UNIT. This was primarily for single-return-value procedures/functions, and actors with similar APIs. In passing, I mentioned that APIs wanting to communication success/failure along with a value/error-code should use a Requestor API inspired by Parseq. Our recurring example of this is the console i/o device. As much as possible, we woud like these APIs to be conveniently interoperable across multiple high-level programming languages.

A result consists of a return value and an error reason. Parseq delivers these as parameters to a callback function callback(value, error), where (value === undefined) indicates failure. Following usual JavaScript convention, omitting the error, as in callback(value), causes (error === undefined) in the callback function.

Adapting Parseq to uFork requires a few changes. The callback is an actor, rather than a procedure/function. And the result is an asynchronous message. A Pair holds the value and error in the head and tail respectively. When the error is falsy (FALSE, NIL, UNDEF, or ZERO), success is indicated. Otherwise it's a failure, possibly with a reason encoded in the error. This leaves open the potential for a value of UNDEF to be returned successfully. However, for compatibility with Parseq, we recommend that the value be UNDEF only on failure.

In JavaScript a result is encoded in the parameter Array, where value is at offset 0 and error is either missing (thus undefined) or at offset 1. The uFork encoding, as a Pair, is flexible enough to map to several high-level language reprentations. In Scheme for example, where Lists are composed of Pairs, a 1-element list (value . #nil) could encode success. If desired, error can be simply TRUE for failure, and FALSE for success, like an enumerated "sum" type, leaving value to carry the data on success. The uFork semantics of checking if error is falsy works for a variety of reasonable API styles. Maximum interoperability is acheived if value is UNDEF if, and only if, error is truthy.

For these reasons, I recommend that UNIT be translated to NIL instead of UNDEF for Requestor-style APIs when the operation was successful but there is no useful data to return. This way we maintain the most interoperable semantics.

jamesdiacono commented 2 weeks ago

There is a lot of prior discussion to document here, so bear with me.

In parseq, a result value of undefined indicates failure. My experience using parseq has shown me an unfortunate consequence of this design decision: a requestor can not produce undefined and succeed. If it tries to, the result will be interpreted as a failure. This means that undefined, alone among values, is reserved. Often I will write a requestor that performs an action but yields no information, such as writing a file to disk. In such cases, I wish I could produce undefined as the value, because undefined is JavaScript's natural bottom value representing the absence of information. Instead, I must choose some other JavaScript value. This feels wrong because all other values imply meaningful information, yet apart from its inequality with undefined the value is ignored.

When we ported the requestor pattern from parseq to uFork, we attempted to improve it by changing how failure was encoded. Rather than a non-#? value, success was instead indicated by a falsy error (formerly reason). There were two perceived benefits to this change:

  1. A requestor can produce any value. That means it can produce #?, uFork's bottom value, when it has nothing to say.
  2. uFork's conditional instruction, if, can operate directly on the error to check for failure. This saves a single instruction (eq #?) but introduces a complication. What if the code checking the result is written in a language such as Humus that does not have a falsiness concept, or worse, has a falsiness concept slightly different from that of uFork? In such languages, additional code (such as a predicate function) must be employed and, needless to say, that adds many more instructions to every check.

It was my understanding that these benefits seemed substantial enough that we were willing to forego interoperability between uFork and parseq requestors. Even if we had retained the original encoding of success vs. failure, translation between a uFork pair and a JavaScript argument array is necessary anyway. There is no hope of direct interoperability as with Humus.

You have suggested that we can achieve interoperability between uFork and parseq requestors by excluding #? from the set of successful values. In this scheme, either of value or error can now be checked to determine failure. Valid uFork results become a strict subset of valid parseq results, where uFork results just have the additional constraint that a failing result must have a truthy error. (Note that this subset relationship implies only one direction of safe interoperability: consider how a uFork requestor will interpret a successful parseq result that happens to have a falsy reason.)

Let us look critically at the result encoding we have thus far defined, (value . error). Firstly, notice that it is non-orthogonal because the value and error must be synchronized, and the rules for doing so are non-trivial (involving falsiness, etc). This seems like a source of confusion and bugs that, in the end, will hamper interoperability. Secondly, we have inadvertently built in a pathological degree of freedom. Consider the following list of nonsense results:

(#nil . -4)  (#t . -4)  (#f . -4)  ...

They are equivalent if we only consider the error, -4. But they are all different if we only consider the value (#nil, #t, #f, etc). These are pathological cases that could cause the behavior of two correct requestor implementations to diverge.

When we ported the requestor pattern to uFork, I believe we overlooked a superior encoding:

(#t . value)
(#f . error)

In this encoding, the head of the pair is a boolean indicating success and the tail is the value or error. The distinction between success and failure could not be clearer. The degree of freedom previously mentioned is gone, and the encoding is completely orthogonal. Any value can be used as the value or error. In fact, both are optional, and that is consistent with other parts of uFork where the "dotted tail" has been used to mean optional.

And there are two smaller benefits. One is the ease with which such a result is processed in assembly:

callback_beh:           ; () <- result
    msg 0               ; result=(success . value_or_error)
    part 1              ; value_or_error success
    if_not fail         ; value
    ...
    ...do something with value...
    ...

fail:                   ; error
    ...
    ...do something with error...
    ...

The other is that the UNIT case falls right out: (#t).

zarutian commented 2 weeks ago

I am up for getting rid of #unit

dalnefre commented 2 weeks ago

In fact, we did consider (ok . value/error), as well as the async node.js convention of (error value). The current choice was intended (as I've explained) to maximize interoperability, but I'm becoming convinced that it doesn't really achieve that goal as well as I had hoped.

The most type-pure and easy to explain convention is (as you suggeted, @jamesdiacono):

(#t . value)
(#f . error)

I believe this also goes by the names "labelled sum type", "discriminated union", "tagged variant", and probably others. It seems to correspond to the Haskell Either type, and of course the Rust Result type. If we are willing to make a clean break from ParseqJS, I believe this would be the most appropriate representation for a Requestor result.

So, does that work fall under this issue? Or should a new issue be created for the conversion effort?

jamesdiacono commented 2 weeks ago

I don't believe it is any more a clean break from parseq than our original encoding. There will always be a translation necessary between uFork and JavaScript, and I have a plan for translating between uFork's (success . value/error) result and parseq's value and reason arguments losslessly (even in the case of (#t . #?)).

I think this work should be done outside this PR, but I'm not sure a new issue is necessary (unless that is our new process?)

jamesdiacono commented 2 weeks ago

I will update this PR to replace the requestor result value #unit with #nil, as that is what we will end up with with the new encoding.