johnwcowan / r7rs-work

96 stars 12 forks source link

Custom Reader #42

Open Zambito1 opened 1 year ago

Zambito1 commented 1 year ago

Hello,

Apologies if this is not the best place to propose this idea, but I have thought of two ways that I think R7RS could sensibly allow for custom readers in certain contexts. To be clear, when I say "reader", I mean a procedure which receives a port, and returns s-exps based on the content of that port until it reaches an eof-object, and to customize the reader, I mean to replace the behavior of read from (scheme read) with the behavior of a different procedure.

One way would be to provide a similar form to include, that takes an extra reader argument. Perhaps it could be called include-with-reader. It would be ideal for this to be an additional library declaration, so that people could use different languages for library definitions.

For example:

foo.py:

def greet_person(name):
    display(string_append("Hello, ", name, "!\n"))

foo.sld:

(define-library (foo)
  (export (rename (greet_person greet-person)))
  (import (rename (scheme base) (string-append string_append) (define def))
          (scheme write)
      (rename (python read) (read python-reader)))
  (include-with-reader "foo.py" python-reader))
> (import (foo))
> (greet-person "Robby")
Hello, Robby!

Another interesting way to specify a custom reader is to add a new reader parameter object to (scheme read). The default value of this object would be a procedure that behaves as read currently does, and the behavior of read would be modified to apply the procedure in the reader parameter.

(define-library (scheme read)
  (export read
          reader)
  (import (scheme base))
  (begin
    (define reader
      (make-parameter
        (lambda port
      ;; current implementation of read
    )))
    (define (read . port)
      (apply (reader) port))))

This could be interesting as it would allow systems that use read internally to be extended to handle new kinds of data. For example, it may be useful to specify json-read from SRFI 180 as the reader parameter value, to handle JSON data in a context that otherwise could not.

The problem I see with going this route (at least exclusively this route) is that there does not seem to be a good way to leverage this to implement a library. For example, given foo.py from above, if we try to do the following:

foo.sld:

(define-library (foo)
  (export (rename (greet_person greet-person)))
  (import (rename (scheme base) (string-append string_append) (define def))
          (scheme write)
      (rename (python read) (read python-reader)))
  (begin
    (parameterize ((reader python-reader))
      (include "foo.py"))))

The definition of greet_person will be captured by the body of the parameterize, rather than expanding into a top-level definition in (foo). Thus, the export would fail. Maybe there is another way that a reader parameter could be used with include for the body of a library like this, but I cannot think of a way that seems great.

I think both of these features would be interesting to see in R7RS large, but the former is more interesting to me personally.

Thoughts?

lassik commented 1 year ago

Each port object should know what kind of syntax it uses. The syntax description should be an abstract data type, with the concrete alternatives evolving over time.

Something vaguely like:

(port-syntax (current-input-port)) => #<lexical-syntax r7rs>
(port-set-syntax! (current-input-port) (r6rs-lexical-syntax))

Common Lisp uses parameters (which are "special variables" in CL parlance) to control the reader and printer. It's passable, but I do not recommend it.

lassik commented 1 year ago

Here's the CL stuff. Looks for the *names-with-stars*.

lassik commented 1 year ago

To (include "...") foreign languages, an alternative reader is too weak. You need a language definition, not just a reader definition. That's substantially more complex. For example, some languages may require a multi-pass reader and translator.

Zambito1 commented 1 year ago

Thanks for the Common Lisp resources, that is in part what made me think about this :)

To (include "...") foreign languages, an alternative reader is too weak. You need a language definition, not just a reader definition. That's substantially more complex. For example, some languages may require a multi-pass reader and translator.

Excuse my naivety on the matter, but wouldn't the "language definition" here be solvable by defining an environment in which the expressions returned by the reader can be evaluated in? That was kind of what I was trying to get at with my (rename (scheme base) (string-append string_append) (define def)). Since def is not defined in Scheme, it must be provided. I assume here that python-reader returns an s-exp compatible with define from (scheme base).

Python has features that are not found in Scheme by default, such as classes with inheritance. A library could be written to provide classes (there are many such libraries, both implementation dependant and portable), and the s-exp returned by python-reader for a class definition could leverage that library.

For example: foo.py:

def greet_person(name):
    print("Hello, " + name)

foo.sld:

(define-library (foo)
  (export (rename (greet_person greet-person)))
  (import (python base) ; notice, no (scheme base)
          (python read))
  (include-with-reader "foo.py" read))

This .py example file is more "pythonic", and assumes that + can handle strings. If a call to read from (python read) on foo.py return something like:

(def (greet_person name)
  (print (+ "Hello, " name)))

and def is a macro defined in (python base), and print and + are both procedures defined in (python base), then it seems like the (foo) library would work correctly.

Zambito1 commented 1 year ago

I think that any other language could be correctly implemented with this feature, but perhaps not efficiently (ie language features that require delimited continuations for a reasonable implementation).

lassik commented 1 year ago

I assume here that python-reader returns an s-exp compatible with define from (scheme base).

That's not what "reader" traditionally means in Lisp and Scheme. The reader is the dumb part that merely tokenizes and parses text into a nested data structure. It doesn't evaluate the data (there's one exception but it's not relevant here).

A JSON reader or ASN.1 reader would make sense, since those are just data as well.

To actually evaluate expressions, define classes, etc. (which can and probably will involve re-ordering definitions, mangling names to match Scheme's naming conventions, etc.) the data coming from the reader is first macroexpanded and then fed to the evaluator or compiler.

lassik commented 1 year ago

So a Python reader would read Python's lexical syntax into some kind of data structure that's a fairly straightforward 1:1 mapping of Python's code structure into Scheme data structures (perhaps patterned after the ast module in the Python standard library).

A "translator" would then turn this into Scheme. It's unwise to bundle the translator into the reader; it's best to keep them separate.

lassik commented 1 year ago

Racket's main selling point is multi language support. Here's their Python: https://github.com/pedropramos/PyonR

Zambito1 commented 1 year ago

That's not what "reader" traditionally means in Lisp and Scheme. The reader is the dumb part that merely tokenizes and parses text into a nested data structure. It doesn't evaluate the data (there's one exception but it's not relevant here).

Hm, I don't think I used "reader" to mean anything beyond that. Maybe I should show how I expect the include-with-reader form in my example would expand.

foo.sld, after the include-with-reader is processed:

(define-library (foo)
  (export (rename (greet_person greet-person)))
  (import (python base) ; notice, no (scheme base)
          (python read))
  (begin
    (def (greet_person name)
      (print (+ "Hello, " name)))))

The read procedure from (python read) just reads from the provided input port, and returns symbolic expressions. The environment that the include-with-reader is called from (in this case, the definition of (foo)) is responsible for dealing with evaluation, macro expansion, name mangling (see how I used rename in my examples to do this). The only way for this definition of (foo) to be sensible is for (python base) to provide a definition of the def macro, and the print and + procedures. That is outside of the scope of the reader.

lassik commented 1 year ago

Sorry about the misunderstanding.

That makes sense, but things like name mangling and macro expansion are likely to be more complex than is comfortable to do simply by reading the Python parse tree into an S-expression which is then interpreted as Scheme. For example, if the included Python code defines 50 symbols, you'd have to manually mangle all their names in the Scheme-side export rename.

Python and other languages also support import statements of their own. How are those handled? For a Scheme library to read imports from an included file, you need to use include-library-declarations (which was introduced in R7RS-small) instead of include.

All in all, interfacing to foreign languages is liable to be full of surprises and complications that go far beyond just reading source code.

Zambito1 commented 1 year ago

For example, if the included Python code defines 50 symbols, you'd have to manually mangle all their names in the Scheme-side export rename.

That is true. I can see how that could get tedious for larger libraries.

Python and other languages also support import statements of their own. How are those handled? For a Scheme library to read imports from an included file, you need to use include-library-declarations (which was introduced in R7RS-small) instead of include.

import would need to be a macro provided by (python base) in this example. How imports go about being resolved and provided to the environment would be an implementation detail of import. Maybe it would use things like load from (scheme load), include from (scheme base), or this new include-with-reader internally. I think it would be reasonable for the definition of import in (python base) to be implemented using implementation specific features (like %load-path in GNU Guile, or expand to use a Scheme import in implementations that allow imports inside library bodies), potentially dispatched using cond-expand.

Zambito1 commented 1 year ago

Also to be clear, I'm not necessarily interested in being able to run code written for other languages verbatim in Scheme (though that would be pretty neat). I am more interested in being able to implement Scheme libraries / programs, while borrowing the surface syntax from other languages. For this use case, simply providing a subset of the language that omits import statements would be pretty reasonable.

APIPLM commented 1 year ago

Includeis that read the included file,then evaluate it as well.It likes compile the one more unit,and all the newed symbols in that unit are available for your unit in Scheme. import is that load the symbol,and rename it in Python.