racket / drracket

DrRacket, IDE for Racket
http://www.racket-lang.org/
Other
447 stars 94 forks source link

Recent snapshot produces unexpected annotations with empty intervals #445

Open greghendershott opened 3 years ago

greghendershott commented 3 years ago

Within the last 6 days, I started having an automated test for Racket Mode fail, only on Racket snapshot.

The test is getting surprised by a new syncheck:add-docs-menu for #%datum, arising from 42 in this little program:

#lang racket/base
(define foobar 42)
foobar

Given the time frame, I glanced at the commit history and it looks like this is related to commit 45fb5f5 (and/or the "upstream" commits its message mentions)?

  1. I don't know if this is intentional, or, a side-effect you just hadn't noticed yet. Maybe it's not obvious in the DrR UI or caught by its tests?

    I'm guessing it's unintentional. A literal 42 offering documentation for #%datum, seems... maybe noisy and not helpful.

    OTOH if in fact intentional, I can adjust some test cases to expect these to appear. Even so, I think there's still a problem...

  2. The syncheck:add-docs-menu values for beg and end are both 33. This is unprecedented, in my experience using check-syntax, beg and end have always meant a half-open interval. Never an empty span. (e.g. In the Racket back end, I collect these annotations in an interval-map; these new annotations raise an error I've not seen before.)

    • I'm not sure what it means to annotate an empty span for an identifier (unlike say an arrow position).

    • Ideally it seems it should be [33 35) to cover all of the text 42. (Less ideally, [33 34) to cover the first character would probably be OK.)

    So I mean, I could force end to be at least 1+ beg, but rather than me paper that over I wanted to see if it made sense to do that within check-syntax.

greghendershott commented 3 years ago

Trying some more examples, I'm also seeing (= beg end) problem for syncheck:add-jump-to-defintion, that arises from new-app.

Possibly there are others I haven't noticed yet.... maybe there is some inflection point in the check-syntax code that would address all of these cases, I'm not sure.

greghendershott commented 3 years ago

Regarding that syncheck:add-jump-to-defintion for new-app, aside from (= beg end), I guess the more significant question is, should it really be an annotation for every single open paren (that's not a macro invocation). That might be too "heavy"? Certainly it makes the checking noticeably slower, for me.

rfindler commented 3 years ago

%app and #%datum both get start=end, pointing at the start of the datum or of the application. This is indeed because of the referenced commit, but the range truncation is happening at the drracket layer.

Specifically, in the commit linked above, see where check syntax responds to the presence of the 'implicit-made-explicit property to point only at the start of the range, not the entire range.

I'm not sure about the half-open interval comment. Maybe I'm confused but I thought the beginning and end of the identifiers are pointing at positions between characters, so "0" is before the first character "1" is between the first and second character, and so on. So if start and end are the same, then we're pointing at the spot between two characters.

rfindler commented 3 years ago

Ah, but maybe for docs this shouldn't happen. I am not sure of the best thing, however, since pointing at the entire application will generally not be very helpful as it will point at way too much, like in examples like:

(list (f x) (g y))

we'd get the #%app for list pointing at the entire thing and overlapping with f and g. I'm open to suggestions on the right behavior here!

greghendershott commented 3 years ago

Regarding the ranges:

To-date, in my experience, all syncheck methods with beg and end are always strictly (< beg end). They're from beg up to but not including end. [beg end). As a result, it never makes sense for beg to equal end.

[Sometimes I name such vars [from upto) vs. [from thru], to emphasize whether the end is exclusive or inclusive. Regardless of what you think of that naming convention, :) that's the distinction I mean.]

What would be less surprising to me, is if:


However. As I mentioned before, I'd also be fine if these new annotations weren't happening, at all.

I appreciate the value of seeing that, for an entire module, #%datum and new-app are the bindings (or other things are).

But I guess I don't understand annotating that over and over again, on each and every occurrence of a literal and open paren. That seems heavy/redundant, implementation wise? As well as maybe being an information fire hose for the user?

What's the background/motivation for the change? Probably I'm missing the whole point!

greghendershott commented 3 years ago

I checked the commit again. It closes #438. That's a great motivation, to improve unused requires detection -- nice!!

So I guess the question (for me) is whether the change is "bleeding over" into the check-syntax UI of DrR and/or Racket Mode in a way that maybe is unnecessary. Or is there some other open issue (officially or unwritten), where enhancing that was a deliberate goal, too.

rfindler commented 3 years ago

Regarding the ranges:

To-date, in my experience, all syncheck methods with beg and end are always strictly (< beg end). They're from beg up to but not including end. [beg end). As a result, it never makes sense for beg to equal end.

These coordinates work the same way that coordinates in text% do, namely they are between characters. So position 0 does not mean the first character. It means the tiny zero-width sliver of space before the first character. Similarly, 1 is not the second character; it is the spot between the first and second character. Beyond text%, there is other precedent here -- this is the same as how drawing coordinates work -- they are between pixels in the same sense. I'm sorry that's not how you understood it, perhaps a documentation update is in order.

As for making the #%datum and #%app be the entire range, this works poorly, at least for #%app, because the ranges are very large and end up with very large overlaps in common situations. I tried it like that (in DrRacket) for a while before rejecting it as obviously unworkable. That is why we have the 'implicit-made-explicit property in the expander. And, application expressions do not always begin with a paren, of course. They do that only in some languages.

[ edit: clarify that #%app was the problematically large ranges in many cases and typo fixes ]

rfindler commented 3 years ago

Would it help if the information coming back from check syntax had the whole range in it but there was a note somehow that said that this was an "implicit-made-explicit" identifier so the client (DrRacket vs racket-mdoe) could decide what to do with it? We could add a new method that had an extra boolean argument indicating if it was one of those identifiers to the syncheck-annotations<%> interface, much like how the add-arrow method grew so many doodads.

I'm not sure that would be helpful given the backwards compatibility constraints, however. Other ideas welcome!

greghendershott commented 3 years ago

To re-emphasize the "heavy" aspect: In my informal use so far, this is making check-syntax take ~6 seconds on a file where it used to take ~2 seconds. This even when I ignore the new annotations that have (= beg end). i.e. The slowdown has nothing to do with handling all these new annotations (beyond receiving all the extra syncheck method calls, testing (< beg end), and ignoring them). It doesn't even count extra time for the back end to transmit them to Emacs, and for Emacs to add all these extra text properties. So, they are heavy, even short-circuiting and ignoring them as soon as I can. :(

It seems unfortunate for check-syntax to be so much slower. Especially when the net benefit would be, now the user can "see documentation for the number 42", or, "jump to the 'definition' of 42".

greghendershott commented 3 years ago

These coordinates work the same way that coordinates in text% do, namely they are between characters. So position 0 does not mean the first character. It means the tiny zero-width sliver of space before the first character. Similarly, 1 is not the second character; it is the spot between the first and second character. Beyond text%, there is other precedent here -- this is the same as how drawing coordinates work -- they are between pixels in the same sense. I'm sorry that's not how you understood it, perhaps a documentation update is in order.

That's not what I'm talking about.

I agree that coordinates are between characters. That's how they work in Emacs, for instance, too.

I'm talking about what span of the user's original source text is being annotated.

Until this, syncheck has never said "Here is an annotation for an empty span" i.e. "Here is an annotation about the nothingness between two characters in your program". Until this, (< beg end) always.

Above I suggested a span for the first character, at least. Like the open paren for #%app would be fine, I said. But...


Anyway, I think it's moot. I think this change is causing a 2X or 3X slowdown. Worse, it is slower to provide the end-user information that seems weird and not-useful (e.g. jump to the "definition" of 42).

So, my hot take is that the change to make unused-require-detection more reliable, is wonderful. The fact that it's spilling over into end-user visible annotations like this, is not wonderful.

greghendershott commented 3 years ago

Correction: With more timing runs, using various versions of Racket, actually I am not seeing a 2-3X runtime for the new check-syntax with all the additional annotations for #%dataum and #%app. It is not noticeably slower. (It might be slower if those new annotations were sent up to Emacs and processed there -- but, that's a different topic and ultimately my problem.)

I'm sorry for any confusion on that front.

Although I still think the new annotations are of questionable value, at least they don't cost a lot to produce.


I do still think the spans should describe at least one character of the user's program, such as the first character. (i.e. I still think (< beg end) is a sensible invariant.)

But if you disagree, then I can just adjust them on my end -- change all (= beg end) to (beg (add1 beg)).

rfindler commented 3 years ago

Hi @greghendershott : what did you think of my suggestion to preserve the original span and add information that indicates if the 'implicit-made-explicit property was there? I can set things up so that if the new method isn't overridden, then it will call the old method. So if you code is using annotations-mixin then it won't have to change and it won't see zero-size spans. (It will see very large spans in some cases, as I mentioned before. I found this to be unworkable but DrRacket's use of this information is different than Racket Mode's so maybe that won't matter.)

greghendershott commented 3 years ago
  1. I appreciate that suggestion. Conceptually I think that would work.

    Probably I'd look for that property, and readjust the large span down to beg (add1 beg) for #%app.

    But for #%datum, I'd probably want to keep the span covering the entire literal.

    Could there be some flag about whether it's from #%app or #%datum?

    I don't know if that seems weird, but it would be less kludgy than me doing something like examining whether the first char is an open paren.

  2. One gotcha, for me, about adding a new method to annotations-mixin: Racket Mode also targets older versions of Racket.

    Normally, when a new version of a library adds a new plain function, I can handle that by trying to dynamic-require it.

    But I don't know how to handle this for my class derived from annotations-mixin -- have code that overrides the new method if running on a newer Racket, else do nothing.

    Wishful thinking, I could still have a single class definition for all versions of Racket, and there is some magic dynamic-define/override I could use for the new method -- if annotations-mixin defines the method, it overrides it, otherwise it's a no-op. Does such a thing exist?

    If not, what strategy do you recommend for writing code that can run across various versions and adapt dynamically to a parent class defining a certain method, or not?

    (Sorry if this is a dumb question. I don't know the racket/class system very well.)

rfindler commented 3 years ago
  1. It looks like the implicit-made-explicit property is only a boolean currently. Maybe @mflatt can say if it would be reasonable for the property's value to be the name of the implicit that got made explicit. (With sufficiently twisty macros we would see both, I suppose, but racket mode could just decide what to do in that case.)

  2. I think you want to call interface->method-names on syncheck-annotations<%>.

greghendershott commented 3 years ago

Also:

  1. You mentioned one new method, singular. But I think this affects at least two, syncheck:add-jump-to-definition and syncheck:add-docs-menu? Possibly more, maybe also syncheck:add-mouse-over-status and even syncheck:add-arrow/name-dup/pxpy??
rfindler commented 3 years ago
  1. Ah, right. I'll have to look into that more carefully!
greghendershott commented 3 years ago

If it changes enough methods, would it make sense to say it's a new annotations-mixin-2 (or better name)?

I don't know if that's a wise or terrible idea, or the tradeoffs. Just spitballing.

mflatt commented 3 years ago

The 'implicit-made-explicit property is added to an identifier that ends up in 'origin, so isn't the 'origin identifier's symbol always the implicit that got made explicit?

rfindler commented 3 years ago

The 'implicit-made-explicit property is added to an identifier that ends up in 'origin, so isn't the 'origin identifier's symbol always the implicit that got made explicit?

Ah, right!

rfindler commented 3 years ago

If it changes enough methods, would it make sense to say it's a new annotations-mixin-2 (or better name)?

I don't know if that's a wise or terrible idea, or the tradeoffs. Just spitballing.

The mixin always adds in just the right set of methods. It changes when the interface changes so client code (like racket mode's in this case) will not have to change when new methods are added (unless the client wants to use the new methods).

rfindler commented 3 years ago

Looking this over more carefully, I'm starting to think that my suggestion to add a boolean everywhere an identifier was mentioned is a bad idea. For example, when calling add-mouse-over-status (which would happen when there is a #%app saying things like "imported from ...") we'd need to somehow indicate that this was coming from an identifier that one might want to chop its range or else large portions of many files will have weird floating "imported from..." hanging on top of them that one won't know is because this is a large application expression. The add-docs-menu method will have similar problems; we really don't want to jump to the docs for #%app over large regions in the file that correspond to some #%app expression so again we'll need another argument there.

Overall, the methods that will need to change are, I believe, syncheck:add-text-type, syncheck:add-background-color, syncheck:add-require-open-menu, syncheck:add-docs-menu, syncheck:add-arrow/name-dup/pxpy, syncheck:add-mouse-over-status, syncheck:add-jump-to-definition, and syncheck:color-range.

Returning to @greghendershott 's initial comment, I'm not sure what the issue with the documentation is. I'm attaching a picture that shows how thing work in DrRacket. It shows that when the insertion point is at 13 (the spot just before the open paren), that the docs are for #%app, since there was a range of [13,13] given that should show those docs.

Screen Shot 2020-12-20 at 3 17 15 PM

Greg, I'm skeptical that you really do want these large ranges. It was really confusing to work with them. If you want to give it a try to see what it is like, there is a simple diff you can apply locally, below. If you do that, then you'll get the full ranges for both the #%datum and #%apps.

Assuming you do find it unworkable [edit: and we add those methods], then you'll need to actually implement all of the methods listed above to trim the ranges in some suitable way.

--- a/drracket-tool-lib/drracket/private/syncheck/traversals.rkt
+++ b/drracket-tool-lib/drracket/private/syncheck/traversals.rkt
@@ -1470,16 +1470,7 @@
     (define shifted-id (syntax-shift-phase-level id level-of-enclosing-module))
     (define old (free-id-table-ref mapping shifted-id '()))
     (define no-span-id
-      (if (syntax-property shifted-id 'implicit-made-explicit)
-          (datum->syntax shifted-id
-                         (syntax-e shifted-id)
-                         (vector (syntax-source shifted-id)
-                                 (syntax-line shifted-id)
-                                 (syntax-column shifted-id)
-                                 (syntax-position shifted-id)
-                                 0)
-                         shifted-id)
-          shifted-id))
+      shifted-id)
     (free-id-table-set! mapping shifted-id (cons no-span-id old))))
greghendershott commented 3 years ago

My observation of the check-syntax API, before this commit:

And that has all made sense to me.

But I feel like this commit is saying: "Every sort of annotation (doc, jump, etc.) that has been about a span, is still about a span (we still supply beg and end). Well... unless it's about #%datum or #%app -- in which case no actually it's about a position, so ignore end, it's just the same as beg."

That... seems weird. If that's your intent, I think it's a departure from your usual API design, but I won't keep beating a dead horse. I don't know how else to explain it. Maybe that's a signal I'm just wrong.


rfindler commented 3 years ago
  • When check-syntax is referring to a span of one or more characters in the user's source, it uses half-open intervals, a pair of beg and end arguments.

Can you explain a bit more what led you to this conclusion?

rfindler commented 3 years ago
  • For the new #%datum annotations arising from literals, e.g. a number like 123, I think it's hard to argue that they're about the position before 1. And I don't see a "too large range" problem. Do you? So maybe this is the easy case. Could we both agree here that beg and end should be the half-open interval spanning 123, just like for everything else so far?

  • For #%app, I understand the problem with the span being too large. For me, a better compromise would be saying the annotation covers the opening delimiter (paren). So just like every other annotation that has beg and end, it's still describing a span and end has a meaning (as opposed to the thing where = beg end i.e. "oh no in this one case it's a position not a span").

    • Caveat: I'm not sure how well either "shortening compromise", yours or mine, a span of 0 or 1, works with non-sexpr langs?

For these two points: while I agree that #%app is more problematic than #%datum, there can be very large datums too, e.g., vectors or hashes. I also don't know if a span of one character is better or worse than a span of zero characters for non-sexpression languages function applications.

greghendershott commented 3 years ago

Ah. It didn't occur to me that a "compound" literal like a hash or vector would be annotated. I was assuming only the individual elements would be. But there was no reason for me to assume that.

rfindler commented 3 years ago

Just to make sure I'm not getting this wrong, this program shows that a hash and a vector both show only the outer thing with a span of the whole thing:

#lang racket

(define p
  (open-input-string
   (string-append
    "#lang racket/base\n"
    "#hash((1 . 2) (3 . 4))\n"
    "#(1 2 3)")))
(port-count-lines! p)

(define src "x.rkt")

(define stx
  (parameterize ([current-namespace (make-base-namespace)]
                 [read-accept-reader #t])
    (expand
     (read-syntax src p))))

(let loop ([stx stx])
  (when (and (identifier? stx)
             (equal? (syntax-source stx) src)
             (equal? '#%datum (syntax-e stx)))
    (printf "span ~s ~s\n" (syntax-span stx) stx))
  (cond
    [(syntax? stx)
     (loop (syntax-property stx 'origin))
     (loop (syntax-e stx))]
    [(pair? stx)
     (loop (car stx))
     (loop (cdr stx))]))
mflatt commented 3 years ago

I'm not sure I've digested the whole thread, but since I was involved with the changes, here's my understanding of the current state.

The span is fixed up to 0 for #%datum because it’s meant to be the span of the #%datum identifier itself, not the number. The #%datum doesn’t take up any space in the source, so 0 is meant to be helpful to tools like DrRacket and Racket mode. For example, the intent would be that no characters in the source are linked to the documentation of #%datum. Meanwhile, the fact that #%datum existed at some point is useful in other ways for tools, so that's why it was added at all.

greghendershott commented 3 years ago

From afar, my impression is:

The motivation was to close #438. Which is great!

But the motivation was not to close another item, which AFAIK doesn't exist, like "Feature request: Surface each and every use of #%app and #%datum in the check-syntax API and UI."

Instead, it's more like, "Oh, we're getting more information. That's probably cool and useful. How should we present it?"

And it's just turning out to be slightly weird, because the new kind of information sort of fits the existing path... but not quite.

That's just my impression. Definitely not criticizing. That's a very normal software situation I've been in many times.

Anyway I'll put a practical suggestion in a separate comment...

greghendershott commented 3 years ago

I'm coming around to the idea that the "empty span" (= beg end) maybe is the best (or anyway, least worst) approach.

Is this slightly a kludge? Maybe. But I think any "more elegant" approach is going to be way more work for everyone involved, than is worthwhile.

This empty span might break some other tools (one or more of the Language Server Protocol implementations for Racket, maybe?), just like it did Racket Mode. But it's not a difficult fix, and probably that's acceptably non-backward-compatible.

For this to work well, I just think we need to change the docs for the check-syntax lib, to explain (roughly):

Normally beg and end describe a half-open interval. However, in the case of #%datum and #%app, (= beg end). You can treat that as a kind of flag that it is such a special item, and do whatever you think makes sense:

  • Show information before the character at point beg.
  • Show information for the character at the span (beg (add1 beg)).
  • Filter such information and don't show it at all.
  • Let your end user decide.
  • Try to accumulate the annotation points into larger ranges approximating the scopes within which all the use sites share the same bindings, and give your user that kind of UI.
  • Or whatever you think would be good.

That's a super rough first draft. Any reactions?

rfindler commented 3 years ago

Thank, @greghendershott ! I agree that some documentation improvements are in order. But I would prefer to explain it in terms of coordinates that are between characters. Does that mean the same thing to you as half-open intervals? It does turn an interval into a point (as you point out earlier) in the special case where beg=end. But DrRacket draws rectangles based on those coordinates, so they aren't really half-open character intervals in the way I think about how this all works.

As for the rest of just elaborating details like that and explaining a few special cases sounds like great ideas.

rfindler commented 3 years ago

Speaking of breaking racket mode, it may be the case that, before this change, things like this might break racket mode?

#lang racket

(define-syntax (m stx)
  (syntax-case stx ()
    [(_ id)
     (datum->syntax #'id
                    (syntax-e #'id)
                    (vector (syntax-source #'id)
                            (syntax-line #'id)
                            (syntax-column #'id)
                            (syntax-position #'id)
                            0))]))

(define x 1)
(m x)

Or maybe variations that mess with the span of the binder as well?

greghendershott commented 3 years ago

Thank, @greghendershott ! I agree that some documentation improvements are in order. But I would prefer to explain it in terms of coordinates that are between characters. Does that mean the same thing to you as half-open intervals? It does turn an interval into a point (as you point out earlier) in the special case where beg=end. But DrRacket draws rectangles based on those coordinates, so they aren't really half-open character intervals in the way I think about how this all works.

Although it wouldn't hurt to clarify that the coordinates are "between the characters", I think that's already the case for most systems that deal with text, that a reader would know? At least, that was never the confusion, for me.

Instead I think the key point is the span.

So maybe: "You might expect that these start and end parameters describe a span of text from the source. Frequently they do. In that case you could think of start and end as a half-open interval, i.e. "from" and "up-to-but-not-including". In that case you could for example throw these values into a data/interval-map. However sometimes start and end are equal, which means that they describe a point in the user's text, but none of its original text. For example a number like 42 might expand into #%datum and the annotation for that would describe the point before 42."

Or instead maybe: Explain that start corresponds to syntax-position and end to (+ syntax-position syntax-span). With a similar note/example that the span might be zero, when the annotation is about something at a location in the source, but not about anything from the source.

What do you think?

Speaking of breaking racket mode, it may be the case that, before this change, things like this might break racket mode?

Yes, that example program with a zero syntax-span also shows that a non-zero span assumption was wrong. It would also cause an error in interval-map-set!. (Before the "hot fix" commit 6ec7c4c I merged a couple days ago; now it just ignores such intervals, until I can sort this out for real.)

rfindler commented 3 years ago

Sure, some text along those lines makes sense to me!

racket-discourse-github-bot commented 1 year ago

This issue has been mentioned on Racket Discussions. There might be relevant details there:

https://racket.discourse.group/t/syncheck-find-source-object/1829/25