johnwcowan / r7rs-work

98 stars 12 forks source link

Bottom Scheme Discussion (was: BottomScheme has read-char and write-char, but not chars) #35

Open jrincayc opened 2 years ago

jrincayc commented 2 years ago

BottomScheme has read-char and write-char, but not the char type, which would probably result in strange incompatibilities.

For example, in R7RS, (write-char (substring "abc" 1 2)) is an error, but I think as BottomScheme is specified, that is the correct way to print part of a string.

If compatibility with R7RS is desired, I think using write-string and read-string might be better.

johnwcowan commented 2 years ago

My intention is that read-char will return a string of length 1, and write-char will accept a string of length 1. What do you think?

lassik commented 2 years ago

Related to Bottom Scheme, there is now https://github.com/jrincayc/r7rs-pico-spec

jrincayc commented 2 years ago

My intention is that read-char will return a string of length 1, and write-char will accept a string of length 1. What do you think?

I guess it depends on how compatible you want Bottom Scheme to be. As is, if strings of length 1 is used I don't think there is a easy way to write code that would work in both R7RS and Bottom Scheme that uses read-char and write-char. (Tho' if you add features then an if could be used to switch between the two)

My personal opinion would be to either add characters to Bottom Scheme or use read-string and write-string. Also, for similar reasons, possibly string-append should be included instead of list->string. display is already in Bottom Scheme, so an IO function with greater complexity than write-string is already required.

lassik commented 2 years ago

I tend to agree with John that characters are a data type of dubious value. Emacs Lisp was designed for a text editor and it uses only integers and strings.

lassik commented 2 years ago

BTW a truly minimal Scheme should perhaps have only read-u8 and write-u8. That gets you ASCII "for free", and users can write their own UTF-8 processing if they want it. A non-UTF-8-aware read-char or read-string would work differently on the same data than the Unicode-aware RnRS versions of those procedures.

jrincayc commented 2 years ago

Hm, how about this possible set of changes to Bottom Scheme:

(I am willing to make a pull request to change to this if anyone wants me to.)

lassik commented 2 years ago

string->utf8 implies Bottom Scheme needs bytevectors. It would be more minimal if it only had lists and strings.

R6RS has:

> (bytevector->u8-list (string->utf8 "hello"))
(104 101 108 108 111)

What if Bottom Scheme had string->u8-list and u8-list->string? You could use those with read-u8 and write-u8 fairly easily.

Shouldn't an ultra-minimal Scheme have users write their own display as well?

lassik commented 2 years ago
(define (display obj)
  (cond ((string? obj)
         (for-each write-u8 (string->u8-list obj)))
        ((number? obj)
         (display (number->string obj)))
        (else
         ...)))
jrincayc commented 2 years ago

Well, Bottom Scheme already has byte vectors :)
https://github.com/johnwcowan/r7rs-work/blob/master/BottomScheme.md I do agree that an ultra minimal Scheme could remove byte vectors.

jrincayc commented 1 year ago

Somewhat off topic, but this is how I see various scheme(ish) specifications/implementations From simpliest to most complicated:

I think there is definitely space in the middle for another specification. I think one interesting goal would be to make a minimal scheme that had enough features to be self-hosting, but otherwise was pretty minimal. I think Bottom Scheme is pretty close to that (I think a bit more IO including ports, set! (since Bottom Scheme already has vector-set!) and then it would be useful spec for various purposes.

lassik commented 1 year ago

Indeed! Subsetting and self-hosting are two areas where Scheme is especially strong.

I've also toyed with a "tower" approach to Scheme, and have some interesting leads but no outcome yet. One of them is that you can make almost anything out of tagged vectors. (Where vector = general vector or homogeneous numeric vector.) If you give up pairs and make lists be vectors instead, the implementation doesn't need any other aggregate data type.

You ought to enjoy https://github.com/udem-dlteam/ribbit. It's based on the "rib" data structure and has a clever GC.

See also https://www.gnu.org/software/mes/

jrincayc commented 1 year ago

Hm, if it doesn't have pairs, I think we are getting into a different language than Scheme, but the results might be interesting. And yes, ribbit and mes are interesting.

lassik commented 1 year ago

I've done a fair bit of exploration on Lisp-without-pairs, and especially Scheme-without-pairs. It works very well. It retains the essential nature of Lisp/Scheme. I'll publish a SRFI about Vector Scheme with a sample implementation when I can find the time. (I have a running interpreter written in C, but it has to be polished.)

More generally, Array Lisp / Array Scheme (where lists are vectors are arrays) works fine.

Making homogeneous vectors a subtype of vector works fine too. In such a Lisp, the list type is the same as the general vector type. So RnRS vectors would be lists, but SRFI 4 vectors would not be lists.

Furthermore, you can add linked lists to a Vector Lisp or Array Lisp. In such a Lisp, pair and null don't belong to the "list" type, but they do belong to the "linked list" type. This works out naturally, too.

Intuitively, in any Lisp, the "list" type is whichever type is created by typing unadorned parentheses (...) in source code. So in Vector Lisp or Array Lisp, (...) would create a general vector. (In Array Lisp, a general vector is a one-dimensional general array.) You'd have to reserve another syntax, such as #L(1 2 3), for linked lists.

The consing dot (1 2 3 . 4) does not make sense in Vector or Array Lisp, except if for the optional linked-list add-on. So if you modified the R7RS-small spec to create a vector version of it, you'd have to excise the use of the consing dot in lambda lists and syntax-rules.

lassik commented 1 year ago

The icing on the cake is that you can write reams of code which are oblivious to whether the underlying list type is a linked list, a vector, or an array! Given sufficient time, I plan to exploit this to the fullest. Lots of non-functional languages have a vector type that's far more efficient and easier to use than linked lists (for which you might have to roll your own datatype). Writing a Lisp or Scheme implementation hosted on these languages, it'd be natural for vectors to be the native list type on the Lisp side to match the situation on the host side.

johnwcowan commented 1 year ago

The consing dot (1 2 3 . 4) does not make sense in Vector or Array Lisp, except if for the optional linked-list add-on. So if you modified the R7RS-small spec to create a vector version of it, you'd have to excise the use of the consing dot in lambda lists and syntax-rules.

Not really. You'd just make the dot an identifier with a special meaning in lambda and syntax-rules. So given (define (foo x . y) ...), then the call (foo 1 2 3) would bind x to 1 and y to the vector (2 3).

Writing a Lisp or Scheme implementation hosted on these languages, it'd be natural for vectors to be the native list type on the Lisp side to match the situation on the host side.

In Twinjo, the ASN.1 sequence type is bound to proper lists in Lisp and vectors in other languages. Lisp vectors are bound to a protocol-specific type E0 and improper lists to type E1.

In the 2-Lisp language (not to be confused with Lisp 2, an early Lisp with Algol syntax, or Lisp-2, a Lisp like CL with separate function and value namespaces), the basic sequence type is called rails, and is notated [1 2 3]. Unlike lists, you can add elements to a rail anywhere and remove them from anywhere without changing their identity (in the sense of eqv?), and empty rails are not necessarily eqv?. An invocation is notated (a . b), which means an application of the function named a to the rail b; the lexical syntax (a b c) is equivalent to (a . [b c]). There are other differences I'm not mentioning here.

lassik commented 1 year ago

You'd just make the dot an identifier with a special meaning in lambda and syntax-rules.

Ah, that would work. I didn't think of that. In standard Scheme, ... (three dots) is already exported from (scheme base).

I opened a separate issue about Twinjo's sequence types.

How old is 2-Lisp and where can we find information on it? Are rails always "proper"?

johnwcowan commented 1 year ago

The patch removing read-char and write-char put utf8->string and string->utf8 in the strings section, but they belong in the bytevectors section, so I changed that.

Bottom Scheme is intended to be, as the document says, easy to compile to C or machine language, so I kept bytevectors. If you want to do without them, just use vectors instead.

johnwcowan commented 1 year ago

2-Lisp is described in Chapter 4 (pp. 253-571) of Brian Cantwell Smith's massive dissertation. The discussion of rails begins on p. 265. The implementation of rails in Maclisp (essentially CL) is documented on p. 714; the code is on pp. 729-730.

There are occasional references to 1-Lisp, which is essentially Scheme with fexprs (spelled "imprs") and CL macros; notationally it is closer to CL (DEFUN instead of DEFINE, e.g.)

lassik commented 1 year ago

Thank you for digging up the reference, very interesting!

lassik commented 1 year ago

GNU R is prior art for "lists are general vectors". From the manual:

Lists (“generic vectors”) are another kind of data storage. Lists have elements, each of which can contain any type of R object, i.e. the elements of a list do not have to be of the same type. List elements are accessed through three different indexing operations. These are explained in detail in Indexing.

Lists are vectors, and the basic vector types are referred to as atomic vectors where it is necessary to exclude lists.