luksamuk / believe

A Bel Lisp interpreter built with C, written as a book/literate program (archived)
MIT License
52 stars 4 forks source link

'id' is too strict #6

Closed masak closed 5 years ago

masak commented 5 years ago

From the implementation of bel_idp:

According to the Bel specification, identity is stricted than equality: there is only one of each symbol and character, but pairs and streams are never identical.

Here is what bellanguage.txt says:

Identity is stricter than equality. While there is only one of each
symbol and character, there can be any number of different pairs with 
the same elements. So two pairs can look the same without being
identical:

> (id '(a b) '(a b))
nil

Note in particular it doesn't say no pairs are ever identical. They are simply checked by reference identity:

> (let (x '(a b)) (id x x))
t

(Example is mine, not from any of the Bel documents.)

I realize that the above reasoning is not conclusive in case someone is convinced id never compares pairs equal, not even to themselves. But note that the definition of variable in [the Bel source]() calls id comparing something against vmark, which has been defined earlier as a pair:

(set vmark (join))

[...]

(def variable (e)
  (if (atom e)
      (no (literal e))
      (id (car e) vmark)))

This would be a useless thing to do unless some things were sometimes identical to a pair (namely the exact same pair).

Same reasoning holds for streams. A stream would compare positive against itself:

> (id ins ins)
t
luksamuk commented 5 years ago

Seems reasonable to me. Now that #7 is fixed, one can clearly see that the value of vmark is (nil) (or, more explicitly (nil . nil)). So the variable check for non-atomic values compares whether the car of such value is identical to (nil). But this only makes sense and is only possible because vmark -- which holds the pair in question -- should be always identical to itself. This is odd at first sight, but I'm convinced.

Seems like the most reasonable approach is to test pairs and streams for identity using their memory addresses.

EDIT: I was also talking to a friend about numbers. Since they are non-standard, I think that comparing for number identity should be the same as comparing whether two numbers are both of NUMBER type and are of the same subtype (integer, float, fraction or complex); then we finally compare for a) equality of values (for integers and floats), or b) identity of sub-components (fractions and complex). Number equality will be handled when I implement bel_prim_eq and variants.

luksamuk commented 5 years ago

I made the changes, but proper verification is still pending, so I'm leaving this issue open for now.

luksamuk commented 5 years ago

On a quick note, it seems like numbers are originally implemented as lits:

The name stands for literal, and it can take any number of arguments. This is how you make things that evaluate to themselves, the way characters or nil do. Functions are lits, for example, as are numbers.

I've implemented numbers as an extra type for performance reasons, though I'm not sure about opening an exception to say that numbers are lits, unless I wrap them on literals like (lit num the-actual-number). Doesn't seem very elegant, though...

masak commented 5 years ago

it seems like numbers are originally implemented as lits

Yes. I've meant to open an issue about that, too... but I've held back, because (a) it would be long, (b) I had lower-hanging fruit to file and fix first, and (c) it's your project, and I don't want to impose by throwing large and ambitious proposals for architecture change at you.

But, in brief, I think believe is in violation of the spec by exposing the separate number type on the type level. That is, first off I think a compliant Bel implementation should say it's a pair:

> (set n 5)
5
> (type 5)
pair

But (more importantly), nothing you could do "inside" the environment would reveal the insides to be anything other than pairs and atoms:

> (cadr 5)
num
> (caddr 5)
; (...) -- a list representing the numerator and denominator of the real part, somewhere containing (t t t t)

The reason my issue would be long is I think I have a fix for this, and I wanted to describe it. It solves the representation issue for numbers, but it can also help "hide" other more efficient representations behind regular Bel list representations. As a simple example, the chars global could use this: instead of being a huge alist of potentially the entire Unicode character table, it could provide quick constant-time access in the typical case, and some kind of low-memory iterator in the case of introspection.

luksamuk commented 5 years ago

I don't want to impose by throwing large and ambitious proposals for architecture change at you.

Don't worry about it. I appreciate all the help you've given so far; the main purpose is to make a Bel interpreter after all. You've been clearing a lot of misunderstandings.

Of course changing many things in the architecture wouldn't be too viable. The goal is to have something that someone could follow and build, even though the code is there, and also to somewhat prove a point about the usefulness of literate programming. Given that, sometimes performance will not be the biggest concern, for example. However, if it is something that breaks the Bel specification, it is prone to be changed.

But, in brief, I think believe is in violation of the spec by exposing the separate number type on the type level. That is, first off I think a compliant Bel implementation should say it's a pair

That is quite a problem, however it needs a careful look. I believe it would be interesting to see if it really breaks the interpreter (the one written in Bel itself) or one of the examples, e.g. if there is some kind of black magic performed on the fact that numbers are pairs (which, to be honest, wouldn't seem sane to me). Bel is primarily a formalism, and IIRC implementing numbers by using pairs would be somewhat similar to generating Church numerals, which is bad enough to performance to make me implement a number type as a first attempt.

But shadowing numbers into pairs is something I'm willing to consider, if it is really going to break something.

I think I have a fix for this, and I wanted to describe it. It solves the representation issue for numbers, but it can also help "hide" other more efficient representations behind regular Bel list representations.

I would be happy to hear about that! This calls for some discussion. I like the way numbers work right now (though I still need to implement fraction simplification), but I'm ok with hiding stuff. Even in the chars case (though I must admit, I just use plain old C char type; but it might change for UTF-8 support).

masak commented 5 years ago

However, if it is something that breaks the Bel specification, it is prone to be changed.

If this is indeed the criterion, then I think it should be changed. The language guide is quite clear that numbers are pairs:

[...] Functions are lits, for example, as are
numbers.

[...]

Numbers and strings are pairs, for example.

[...]

Numbers in Bel take the form

(lit num (sign n d) (sign n d))

Even if the language guide were to be dismissed as non-normative, the Bel source makes it clear that's how numbers should be represented.

I'm less sure it will actually break Bel bootstrapping somehow. Looking ahead, maybe the biggest trouble you will have is when your Bel-in-C numbers (struct BEL_NUMBER) need to interact with your Bel-in-Bel numbers ((lit num ...)). Separately, these would work fine, but additional special-casing might be needed to upconvert or downconvert to make operations on heterogeneous numbers work.

though I must admit, I just use plain old C char type; but it might change for UTF-8 support

I was about to say that 'BROKEN BAR' (U+00A6) would be a problem in the reader and printer down the line, since it doesn't fit in a byte. But, turns out it does fit in a byte; it's part of Latin-1, just not of ASCII. I think this is the only character in the Bel source outside of the ASCII range.

Changing to UTF-8 down the line might still make sense, of course, but I can understand if you don't rush into it. One of the criticisms leveled at Arc when it came out was that it only supported ASCII; surely pg's decision that Bel not specify exactly which character set to use is colored by this.

luksamuk commented 5 years ago

Taking a better look at how numbers can be represented, it seems to me that it would not be so hard to change certain aspects of processing numbers so that they can "seem" like the default Bel implementation, with the exception maybe of float numbers.

But changing the topic a bit to a more crucial matter, albeit still related to this issue...

I've been reading the guide a lot more carefully now -- to be honest, the number implementation that is currently on the source did not derive from Bel's idea; I didn't even know that it supported complex numbers until I actually searched for it on the guide --, and this part right in the beginning reminded me of a crucial point:

what happens if, instead of switching from the formal to the implementation phase as soon as possible, you try to delay that switch for as long as possible?

The sad thing about this is that, by implementing Bel on first sight, this means that I kind of violated a bigger principle of the language. The specification was supposed to cook a little more until we got to the level of "good enough for implementing", though I suppose that an attempt at implementing Bel would help with that in many ways.

I still think that it is possible to implement Bel with what is already "standard", but maybe believe should attempt less to follow everything that one would expect from a Bel interpreter, and be a language of its own, albeit heavily inspired by Bel. This is because (a) there are some things I don't like about both syntax and scope order of Bel; and (b) I've wanted to implement a Lisp dialect of my own for a while now.

So I think I'll rewrite it to be explicitly inspired by Bel, or maybe I'll simply fork it into another language -- but if I do, I'll probably stop developing this interpreter as a literate program.

masak commented 5 years ago

I still think that it is possible to implement Bel with what is already "standard", but maybe believe should attempt less to follow everything that one would expect from a Bel interpreter, and be a language of its own, albeit heavily inspired by Bel. This is because (a) there are some things I don't like about both syntax and scope order of Bel; and (b) I've wanted to implement a Lisp dialect of my own for a while now.

So I think I'll rewrite it to be explicitly inspired by Bel, or maybe I'll simply fork it into another language -- but if I do, I'll probably stop developing this interpreter as a literate program.

Aww. I respect your decision, but I sincerely wish you'd stay on the path of implementing Bel according to spec.

Currently, we're in a weird and wonderful state of the world, where there's no publicly available implementation of Bel. Not even a slow interpreter. Maybe pg has one written in Arc, running on his machine, but he's not sharing it.

Yours is the only project I'm aware of that works towards fixing that. I have a private project with an implementation in Python, and at some point I'll make it public, but I don't feel I'm at the point yet where it's even remotely useful.

I've already started writing some Bel programs, also in private GitHub projects. Right now I have nothing to run them on, and so I'm writing them "blind". Really looking forward to having a complete-enough, compliant Bel implementation to test them on... if you decide to turn believe into something else, I guess I'll only have myself and my burgeoning implementation to count on. 🙂

masak commented 5 years ago

Also, I understand you're still thinking about this. Please hold off until I find the brain time slots to submit an issue about the representation thing, as promised.

As a teaser for another issue: I realized you'll want a hash table for the symbol lookup, and not an array. I plan to not only issue that, but PR it. 😄 Stay tuned.

luksamuk commented 5 years ago

No problem, then.

I'll keep it going. If this is really going to help anyone in any way, then I think this project must continue. If I end up making modifications, I'll just fork this project instead of turning it into something else.

luksamuk commented 5 years ago

I am closing this issue since the extra discussion was moved somewhere else.