lexi-lambda / racket-r7rs

An implementation of R7RS in Racket
97 stars 14 forks source link

Cannot read program with datum labels in quoted literals #17

Closed Oxyd closed 2 years ago

Oxyd commented 2 years ago

2.4 of R7RS specifies datum labels that allow reading and writing data structures with shared or circular structure. At the end of 2.4 there is a paragraph that says it is an error for datum labels to be used in program source, but they are explicitly allowed in literals.

R7RS contains one example of a datum label used inside quote in program code, in section 6.1 as one of the examples for the equal? procedure:

(equal? ’#1=(a b . #1#)
        ’#2=(a b a b . #2#))

This example doesn't work in Racket with #lang r7rs, giving an error: read-syntax: #...# forms not allowed in read-syntax mode

lexi-lambda commented 2 years ago

Thanks for the report—I agree it’s a flaw in the current implementation according to the spec as-written.

Unfortunately, it’s sort of unclear to me how the spec intends for this this to be handled. It says that datum labels are to be allowed in literals and nowhere else, but what is and is not a literal is not known until after macroexpansion. For example, we can easily write the following:

(define-syntax quoted-car
  (syntax-rules (quote)
    ((_ (quote (a . b)))
     (quote a))))

(quoted-car '#0=(1 . #0#))

Should this be allowed? The datum labels appear syntactically under quote, and that use of quote even has the usual binding, but then our quoted-car tries to decompose it. I can come up with arguments both for and against allowing this:

Personally, I think the arguments against this being allowed are significantly more compelling—allowing this seems like a plainly bad idea. But in that case, how does an implementation know when to allow circular structure and when to reject it? It cannot possibly be determined at read-time, so circular structure must always be allowed in the reader, and it must be somehow be rejected by the expander.

But it still isn’t obvious how to do this! For example, should this program be allowed or rejected?

(define-syntax let-and
  (syntax-rules ()
    ((_ x e1 e2)
     (let ((x e1))
       (and x e2)))))

(let-and x '#0=(1 . #0#) (car x))

In this program, circular syntax appears as an argument to a macro, but it’s just passed through completely unchanged. This must be allowed, since otherwise the behavior of quote would depend on whether or not it is passed as an argument to a macro, and all sorts of core forms are usually implemented as macros (such as and and or, themselves).

This leaves us in a pickle about how/when to actually report the error. Do we have to do it when syntax-rules attempts to decompose a piece of circular syntax, but not a moment earlier? What precisely constitutes such an illegal match? The spec certainly doesn’t say.

All this is to say that I’m largely of the opinion that this behavior is woefully underspecified, to the extent that I would call this a flaw in the specification. Do you know if any other R7RS implementations handle this, and if so, what their behavior is? Without more context, it’s pretty hard for me to know what the right thing to do even is.

johnwcowan commented 2 years ago

@dpk and I agree that this sort of situation comes under "is an error" behavior, so the answer is that you can do whatever you think best or is easiest.

lexi-lambda commented 2 years ago

I’m not sure I understand. Do you mean that just never allowing datum labels in program text is conforming, as this implementation does now (in which case this issue can be closed as “not a bug”)? Or is there some set of situations where it must be supported? And if it’s the latter, what situations are those?

johnwcowan commented 2 years ago

Do you mean that just never allowing datum labels in program text is conforming

Technically yes, since there is no MUSTard in the last paragraph of section 2.4. However, it would be reasonable to provide support in less confusing cases as a matter of high quality of implementation.

Oxyd commented 2 years ago

Saying that not allowing datum labels anywhere in a <program> is conforming to me sounds like a very creative reading of the report. Without language-lawyering the exact reason why, I think it's quite apparent that the intent was to allow datum labels in quotes. This is corrobated by the example I referred to in the first post.

It is however “an error” to use a datum label anywhere outside a quoted literal, which gives an implementation the freedom to do whatever it chooses. So it doesn't need to “detect” such cases, so long as it correctly accepts the only required case.

As for how other implementations handle it, this is what Chbi does:

(quoted-car '#0=(1 . #0#)) => 1 (let-and x '#0=(1 . #0#) (car x)) => 1

Which is what I would expect.

lexi-lambda commented 2 years ago

Do you mean that just never allowing datum labels in program text is conforming

Technically yes, since there is no MUSTard in the last paragraph of section 2.4. However, it would be reasonable to provide support in less confusing cases as a matter of high quality of implementation.

Well, for the most part, this implementation largely tries to adhere to the philosophy of being as minimal an R7RS as possible: if the spec does not guarantee that something is allowed, this implementation generally tries to reject such things outright. Obviously, this philosophy is not taken to an absurd extreme—if implementing the bare minimum would require a significant amount of additional effort, this implementation does not bother—but it’s what the library generally aspires to.

The rationale for this philosophy is that programmers should be able to have reasonable confidence that if their program runs under #lang r7rs (and they don’t import any Racket-specific libraries), it will also run on other R7RS implementations. Programmers can always choose to opt out of this strictness by using cond-expand to import some Racket-specific forms or functions, but this must be done explicitly.

(Aside: I happen to be of the opinion that this philosophy is a good one, but for what it’s worth, I don’t really see it as my decision per se. I decided to take this approach when implementing #lang r7rs mainly because it’s consistent with the approach taken with #lang r5rs and #lang r6rs, so doing otherwise would be inconsistent with Racket’s other RnRS implementations.)


Saying that not allowing datum labels anywhere in a <program> is conforming to me sounds like a very creative reading of the report. Without language-lawyering the exact reason why, I think it's quite apparent that the intent was to allow datum labels in quotes. This is corrobated by the example I referred to in the first post.

I agree with this, but I would prefer to not make an arbitrary decision about which things to allow and which things to reject. Obviously I could support precisely the examples given in the spec and just accept that the implementation might do arbitrary and unpredictable things in other cases (leaving it completely unstated which cases can be relied upon), but I think that would be a very poor implementation.

As for how other implementations handle it, this is what Chbi does:

(quoted-car '#0=(1 . #0#)) => 1 (let-and x '#0=(1 . #0#) (car x)) => 1

Which is what I would expect.

What does Chibi do for something like this?

(define-syntax static-last
  (syntax-rules ()
    ((_ (a))     a)
    ((_ (a . b)) (static-last b))))

(static-last '#0=(1 . #0#))

Does it report an error diagnostic? Or does it just spin forever?

Oxyd commented 2 years ago

What does Chibi do for something like this?

(define-syntax static-last
  (syntax-rules ()
    ((_ (a))     a)
    ((_ (a . b)) (static-last b))))

(static-last '#0=(1 . #0#))

Does it report an error diagnostic? Or does it just spin forever?

It reports an error: dotted list in source: (1...

However, this doesn't seem to be related to using datum labels at all. Trying to evaluate (static-last '(1 . 2)) results in the same error.

lexi-lambda commented 2 years ago

Sorry, my program had a typo: I didn’t mean to include the quote. The correct program is this:

(define-syntax static-last
  (syntax-rules ()
    ((_ (a))     a)
    ((_ (a . b)) (static-last b))))

(static-last #0=(1 . #0#))

However, I decided to stop wasting your time and just build Chibi myself, and I’ve confirmed that this results in the macroexpander diverging. I don’t think that’s especially great behavior, either, but at least it’s fairly consistent and predictable. Do you think that behavior is the best interpretation of the specification?

Oxyd commented 2 years ago

this results in the macroexpander diverging. I don’t think that’s especially great behavior, either, but at least it’s fairly consistent and predictable. Do you think that behavior is the best interpretation of the specification?

I'd say so, yeah. Trying to reach the end of an infinite list is always going to diverge, whether you try it at run-time or expand-time. One can already write a diverging syntax-rules macro without using datum labels at all, so I don't think it's something unexpected.

lexi-lambda commented 2 years ago

Alright, I’m willing to give implementing that behavior a try.

It will be a headache to implement in Racket, as Racket does not support circular syntax objects at all—attempting to create one results in an immediate runtime error—which means I’ll have to encode it indirectly and implement R7RS-specific versions of most syntactic forms to interpret circular arguments appropriately. That is quite obnoxious (and syntactic forms imported from Racket libraries may do nonsensical things when given circular syntax), but it’s ultimately doable.

johnwcowan commented 2 years ago

Given that limitation, I would say "don't try the heroics, just fail".

On Wed, Feb 16, 2022 at 2:25 PM Alexis King @.***> wrote:

Alright, I’m willing to give implementing that behavior a try.

It will be a headache to implement in Racket, as Racket does not support circular syntax objects at all—attempting to create one results in an immediate runtime error—which means I’ll have to encode it indirectly and implement R7RS-specific versions of most syntactic forms to interpret circular arguments appropriately. That is quite obnoxious (and syntactic forms imported from Racket libraries may do nonsensical things when given circular syntax), but it’s ultimately doable.

— Reply to this email directly, view it on GitHub https://github.com/lexi-lambda/racket-r7rs/issues/17#issuecomment-1042076365, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPPBWUSXHPFTBYDBBT7DLU3P22FANCNFSM5OMZ2UHQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

Oxyd commented 2 years ago

I'm certainly willing to accept “too difficult to implement” as a resolution here, so long as it's documented in the readme as something that should work but doesn't. I'm not familiar enough with Racket to know how (in)feasible this is, so I'll leave any decisions up to Lexi.

lexi-lambda commented 2 years ago

I consider any point of nonconformance a bug in this library, and I don’t think it’s ever really justified unless it’s genuinely impossible or would seriously compromise the system’s usability. There are only two known discrepancies with the standard I have historically considered acceptable, namely:

  1. exit behaves like emergency-exit because the Racket runtime simply does not provide the functionality to implement R7RS’s exit behavior reliably.

  2. The interpretation of define-library is somewhat restricted because #lang r7rs is already needed to select R7RS as the module’s language, and that process necessarily subsumes some of the roles that define-library would otherwise provide.

Both of those issues are essentially impossible to fix because of Racket’s nature as a multi-language environment. This particular issue, however, is resolvable, so I believe not doing so constitutes a bug.

lexi-lambda commented 2 years ago

I have now implemented support for this in 20bef00247fbdb0eefd30210187cb739c3c35276, which is not yet merged because it depends on a supporting patch to the Racket reader, racket/racket#4162. That means this functionality will only be supported beginning with the next release of Racket (though once the upstream patch is merged, you should also be able to use a development snapshot).

The upstream patch does not provide a full solution: it only makes a minimal modification to the reader to allow reading graph structure in read-syntax mode. However, it does not remove the restriction that Racket syntax objects be non-circular, so circular references are instead represented indirectly. My patch to r7rs-lib provides reimplementations of quote and syntax-rules for #lang r7rs that give special treatment to syntax objects containing such indirections, which allows all the programs discussed in this issue to work.

From a user’s perspective, this implementation strategy should be essentially invisible as long as the program uses only standard R7RS libraries. However, forms defined in other Racket libraries do not give these syntax objects any special treatment, so trying to pass circular structures to other macros will fail if the macro needs to traverse those structures to perform its expansion. However, if the circularity is only present in a subexpression passed to a macro, wrapped inside a use of #lang r7rs’s version of quote, everything should behave smoothly, since the unusual syntactic structure is locally contained. This is much less likely to cause any problems than the more immediate issue of R7RS’s use of mutable pairs, so I doubt anyone will take issue with this compromise.

Oxyd commented 2 years ago

Amazing, thanks!