Open jackfirth opened 2 years ago
I think the rule name identifiers shouldn't have any source location information
Does this happen with ragg
too, or just brag
?
if brag
handles source locations in a way that’s contrary to documentation or syntax-object norms, I welcome supporting evidence that this is so. Otherwise I would invoke the existing Racket norm against changing the behavior of a package in a backward-incompatible way.
ragg
too.
(require syntax/modread)
(with-module-reading-parameterization (λ () (with-input-from-string "#lang racket/base 42" (λ () (read-syntax)))))
It produces this syntax object:
```scheme
(module anonymous-module racket/base
(#%module-begin 42))
Both of the module
and anonymous-module
identifiers have a span of zero and are not original. The racket/base
identifier and the 42
literal each have correct starts and spans, pointing to the racket/base
and 42
substrings of #lang racket/base 42
, and they're both original. The #%module-begin
identifier is an odd one: it's not original but it does have a source location that is the same as the enclosing (#%module-begin 42)
form. Due to the way the module
and anonymous-module
identifiers are handled, I suspect that's just a bug.
The whole form has a start position of 7 and a span of 14, pointing to the racket/base 42
substring, and it is not original because it contains the unoriginal module
, anonymous-module
, and #%module-begin
pieces. The (#%module-begin 42)
form also isn't original and it has the same start location and span, which I suspect is another bug since it claims to represent the racket/base 42
substring of the program code but the (#%module-begin 42)
form doesn't actually contain the racket/base
identifier. It should probably only claim to contain the 42
substring of the code.
It's a bit tricky to say for sure what the "intent" here is because source locations are tricky to produce and mistakes in them are rarely noticed. I think for syntax objects produced by a language's read-syntax
function, these are some good guidelines:
#lang
line is used for the module's initial bindings, it should be original and have a source location.
So given a grammar like this:
And an appropriate lexer-based tokenizer, using
(parse path (make-tokenizer port))
produces syntax objects that look like this:All well and good. The source locations are even correct, assuming the lexer uses
lexer-srcloc
. Specifically, the following syntax objects have source locations:program
andstatement
identifiers has a source locationThat last part seems off to me. The
program
identifier gets the same source location as the surrounding(program ...)
syntax object. But the identifier itself is more of an implicitly-inserted thing from the user's perspective, like#%app
or#%datum
.Where this matters to me is that I use the source locations of original syntax objects in my
resyntax
tool to figure out how to copy their original source code text into the refactored output code. So if one of thoseprogram
orstatement
identifiers ends up in the output syntax object of my refactoring tool - perhaps because it was rearranging pieces of the enclosing(program ...)
expression - the tool will duplicate the whole original expression when it tries to figure out how to render the outputprogram
identifier in refactored source code.I think the rule name identifiers shouldn't have any source location information. Maybe they shouldn't even be
syntax-original?
, but that I'm less sure on.