Added bytevector to readtable and made identifiers case-sensitive (as per R7RS Sec. 2.1)

peblair commented 9 years ago

Feel free to look this over. I just thought I'd help contribute a bit.

Fair warning: the #u8 readtable syntax works, but DrRacket does not highlight it as a non-error.

peblair commented 9 years ago

P.S. After reading the spec, the main thing that still seems to be unsupported (readtable-wise) is the #<n>=<datum>/#<n># syntax. Racket has utilities for this in the form of read-accept-graph, but simply enabling that parameter is not enough to make it work. I just gave it a cursory inspection, though, so there might be a trick to enabling it.

Edit: I forgot about #!fold-case and #!no-fold-case also.

lexi-lambda commented 9 years ago

Thanks a lot for the help! This looks good. I left one comment, and could you also make sure all the files have trailing newlines? Make those minor adjustments, and I'm happy to merge this.

peblair commented 9 years ago

I removed the r7rs: prefix (it was left over from when I had originally implemented an r7rs:read and r7rs:read-syntax pair of function) and added those EOF newlines.

I'm glad to have helped!

lexi-lambda commented 9 years ago

Looks good! I just cloned it, and I did discover a small problem: it doesn't seem to work on HEAD, and therefore it won't work on 6.3. Looks like it doesn't work with the new macro expander. I tried it with this program:

#lang r7rs
(import (scheme base))
(display #u8(1 2 3))

I got this error:

r7rs/lang/r7rs-reader.rkt:39:13: bytevector: identifier's binding is ambiguous
  context.:
   #(-458352200736611477 module) #(-458352200736611476 module r7rs-reader 0)
   #(1755194 module) #(1755195 module anonymous-module 0)
  matching binding.:
   #<module-path-index:(r7rs/base)>
   #(1755194 module) #(1755195 module anonymous-module 0)
  matching binding.:
   #<module-path-index:((quote #%kernel))>
   #(-458352200736611477 module) #(-458352200736611476 module r7rs-reader 0)
  in: bytevector

This type of thing is sometimes a little tricky—reader extensions that introduce identifiers is not as clean as it could be. You might need to fiddle with make-syntax-introducer to add a new scope in order to keep the binding unambiguous.

(As an aside, this would be a perfect thing to write some tests for, but I admit I haven't gotten around to getting a good testing setup for this package yet, so I'm okay with holding off on that for now.)

peblair commented 9 years ago

Turns out the fix was pretty simple: delete all lexical context from the syntax returned from the reader extension (i.e. (datum->syntax #f <syntax goes here> <src-location>)). It was documented here.

I also went ahead and threw in a function to add the readtable extensions to the REPL as well.

lexi-lambda commented 9 years ago

Hmm. I'm not completely convinced that's the right approach, but I'm not sure. Does it work if the bytevector reader syntax is used in a module in which the bytevector binding is renamed or not available? I think we should try and maintain hygiene.

peblair commented 9 years ago

I agree with your love of hygiene, but the reader already does weird things with readtable expansions. Consider this broken piece of code:

#lang racket
(define quote +)
'(2 2)

(I had intended for this to expand to (+ 2 2), but it behaves much more poorly than that)

Is this not the same kind of idea?

EDIT: This better describes what I meant to demonstrate:

#lang racket
(define quote (λ(x) (+ x 2)))
'2

lexi-lambda commented 9 years ago

Yes, quote, quasiquote, unquote, and unquote-splicing are are unhygienic. But this is not a reader abbreviation in quite the same way (and those unhygienic extensions are holdovers from history). This program works just fine:

#lang racket/base
(define (vector . args) args)
#(1 2 3) ; => '#(1 2 3), not '(1 2 3)

We should use the better behavior given that we have the opportunity. I'm willing to figure out how to handle making hygiene work, though, if you'd like.

jackfirth commented 9 years ago

@AlexKnauth has a meta-lang that provides hygienic versions of quote, quasiquote, syntax, etc. Looking to that as a guide may make it easier to figure out how to do this hygienically - https://github.com/AlexKnauth/hygienic-quote-lang.

lexi-lambda commented 9 years ago

Yes, I was thinking about that. It looks like this commit addresses specifically the problem we're talking about: https://github.com/AlexKnauth/hygienic-quote-lang/commit/d50b7fd2cff48a2889b85b00d2714d38c2f67fcf

However, I'm not completely sure that's right... we might not want to apply make-syntax-introducer at all if we're running under the old macro expander, since I think using make-syntax-introducer may mess with syntax-original? or other properties. I'm not really sure, though. I should try and break Alex's package to see if it's a problem, and if not, it sounds like the way to go.

lexi-lambda commented 9 years ago

After some investigation it looks like:

Alex's hygienic quote does make syntax-original? return #f while racket/base's quote returns #t.
It probably doesn't matter for 99% of usages, but we should still have good manners and preserve the property if we can.

I think maybe we should try to use Alex's approach and check the arity of make-syntax-introducer, but let's just not apply it at all if it doesn't support any arguments. Let's see if that causes any problems.

AlexKnauth commented 9 years ago

You can use (make-syntax-introducer #t) instead of (make-syntax-introducer) to preserve syntax-original?. You'll definately have to do that for the outer scope, but for the inner scope (make-syntax-introducer) will be fine (Edit: (make-syntax-introducer #t) will also be fine). The outer syntax object will not be original, but the pieces inside that you care about will be.

lexi-lambda commented 8 years ago

I ended up implementing this myself in 1d1b9b125268b8ea22b4d36befdefda50003af1c. I sidestepped the issue by just returning a bytevector as syntax instead of expanding to a function call, which is valid since Racket actually has reader syntax for bytevectors (#"foo").

Thanks for the effort, anyway, though, and sorry I didn't get to merge this in.

AlexKnauth commented 8 years ago

Oh, I forgot you could do that. Yes, that's definitely a better approach. Should it return an immutable byte vector instead of a mutable one though, since literals are supposed to be immutable?

lexi-lambda commented 8 years ago

The syntax system happily converts the byte string to an immutable one automatically, so I didn't worry about that. The R7RS standard also allows literals to be immutable, so that's perfectly conformant.

AlexKnauth commented 8 years ago

But is (read) also supposed to return an immutable one?

lexi-lambda commented 8 years ago

Well, the spec doesn't actually say. It's lenient here, so pretty much anything goes. You could certainly make a case that read and read-syntax should work the same, but I didn't give it too much thought.

lexi-lambda / racket-r7rs

Added bytevector to readtable and made identifiers case-sensitive (as per R7RS Sec. 2.1) #1