s-expressionists / Eclector

A portable Common Lisp reader that is highly customizable, can recover from errors and can return concrete syntax trees
https://s-expressionists.github.io/Eclector/
BSD 2-Clause "Simplified" License
109 stars 9 forks source link

Read-delimited-list issue #67

Closed kpoeck closed 4 years ago

kpoeck commented 4 years ago

In sbcl, latest eclector from git:

* (WITH-INPUT-FROM-STRING (*STANDARD-INPUT* "1 2 3 ]")
  (READ-DELIMITED-LIST #\] NIL))
(1 2 3)

* (WITH-INPUT-FROM-STRING (*STANDARD-INPUT* "1 2 3 ]")
  (eclector.reader:READ-DELIMITED-LIST #\] NIL))
debugger invoked on a ECLECTOR.READER:UNTERMINATED-LIST in thread
#<THREAD "main thread" RUNNING {1000510083}>:
  While reading list, expected the character ] when input ended.

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [RECOVER] Return a list of the already read elements.
  1: [ABORT  ] Exit debugger, returning to top level.

(ECLECTOR.BASE:%READER-ERROR NIL ECLECTOR.READER:UNTERMINATED-LIST :STREAM-POSITION NIL :DELIMITER #\])
   source: (APPLY #'ERROR DATUM :STREAM STREAM :STREAM-POSITION STREAM-POSITION
                  (ALEXANDRIA.1.0.0:REMOVE-FROM-PLIST ARGUMENTS
                                                      :STREAM-POSITION))
scymtym commented 4 years ago

I think this is not completely clear in the specification although I will admit that the interpretation chosen by Eclector has much weaker support than the one chosen by SBCL.

read-delimited-list reads objects from input-stream until the next character after an object's representation (ignoring whitespace[2] characters and comments) is char.

This describes the behavior exhibited by SBCL: read objects and skip whitespace (and whitespace-like reader macros) until encountering the given character.

read-delimited-list looks ahead at each step for the next non-whitespace[2] character and peeks at it as if with peek-char. If it is char, then the character is consumed and the list of objects is returned. If it is a constituent or escape character, then read is used to read an object, which is added to the end of the list. If it is a macro character, its reader macro function is called; if the function returns a value, that value is added to the list. The peek-ahead process is then repeated.

This makes it sound as if the next character is either char or a constituent, implying that char must not have syntax type constituent. This interpretation actually makes sense because of the following problem (the output below is for SBCL):

(with-input-from-string (stream "1 2 3 ]")
  (read-delimited-list #\] stream))
=> (1 2 3)

(with-input-from-string (stream "1 2 3]")
  (read-delimited-list #\] stream))
|- end-of-file error

I assume that is why your example uses what would be [ 1 2 3 ] instead of the more idiomatic [1 2 3].

One of the examples (not normative, I know) states:

It is necessary here to give a definition to the character } as well to prevent it from being a constituent. If the line

(set-macro-character #} (get-macro-character #) nil))

shown above were not included, then the } in

{ p q z a}

would be considered a constituent character, part of the symbol named a}. This could be corrected by putting a space before the }, but it is better to call set-macro-character.

Granted, this may only refer to the specific example of making

#{p q z a} read as ((p q) (p z) (p a) (q z) (q a) (z a))

but the problem is more general.

Changing Eclector to behave like SBCL has two downsides:

  1. The algorithm described above duplicates a lot of the code at the core of read.

  2. In Eclector, read goes through the generic function eclector.reader:read-common on which clients can define methods. read-delimited-lister currently also goes through read thus respecting client-defined methods. Implementing the above algorithm would bypass such methods.

The longer I think about this, the more I am convinced that Eclector has to change, but I would like to find a way that avoids the above downsides. Maybe the planned read-maybe-nothing method can help?