greghendershott commented 3 years ago

This is for discussion, not yet a specific proposal.

Status quo

; define.rkt
(struct s (a))
s-a
(provide (struct-out s))

s-a is treated as two "sub-range binders", s and a. There are two lexical arrows (require-arrow = #f) between s and a. This is nice.

; require.rkt
(require "define.rkt")
s-a

s-a is treated as a single identifier. There's one require arrow (require-arrow = #t) between s-a and "the-above.rkt". This seems less nice.

Instead?

Should there instead be two require arrows in require.rkt?

Should add-definition-target be called for s-a with the srcloc for the a within the struct form, for define.rkt? (It might be already, I haven't yet checked.)

Should add-jump-to-definition be called for the a in s-a in require.rkt? (It might be already, I haven't yet checked.)

In other words, the sub-range binders idea would apply to imported as well as local definitions.

rfindler commented 3 years ago

Looks like the answer to the questions are both "no" currently:

#lang racket
(require drracket/check-syntax)

(define (run str)

  (define annotations
    (new (class (annotations-mixin object%)
           (define/override (syncheck:add-jump-to-definition . args)
             (printf "syncheck:add-jump-to-definition ~s\n" args))
           (define/override (syncheck:add-definition-target . args)
             (printf "syncheck:add-definition-target ~s\n" args))
           (define/override (syncheck:find-source-object stx)
             (if (eq? 'the-source (syntax-source stx))
                 'yep
                 #f))
           (super-new))))

  (define-values (add-syntax done)
    (make-traversal (make-base-namespace) #f))

  (parameterize ([current-annotations annotations]
                 [current-namespace (make-base-namespace)])
    (add-syntax (expand
                 (parameterize ([read-accept-reader #t])
                   (read-syntax 'the-source (open-input-string str)))))
    (done)))

(run
 (string-append
  "#lang racket\n"
  "(module m racket\n"
  "  (provide (struct-out s))\n"
  "  (struct s (a)))\n"
  "\n"
  "(require 'm)\n"
  "s-a"))

produces

syncheck:add-definition-target (the-source 67 68 struct:s (m))
syncheck:add-definition-target (the-source 67 68 s? (m))
syncheck:add-definition-target (the-source 67 68 s-a (m))
syncheck:add-definition-target (the-source 67 68 s (m))
syncheck:add-jump-to-definition (yep 77 84 require #<path:/Users/robby/git/exp/plt/racket/collects/racket/private/reqprov.rkt> ())
syncheck:add-jump-to-definition (yep 42 52 struct-out #<path:/Users/robby/git/exp/plt/racket/collects/racket/private/reqprov.rkt> ())
syncheck:add-jump-to-definition (yep 33 40 provide #<path:/Users/robby/git/exp/plt/racket/collects/racket/private/reqprov.rkt> ())
syncheck:add-jump-to-definition (yep 60 66 struct #<path:/Users/robby/git/exp/plt/racket/collects/racket/private/struct.rkt> ())
syncheck:add-jump-to-definition (yep 60 66 struct #<path:/Users/robby/git/exp/plt/racket/collects/racket/private/struct.rkt> ())

shhyou commented 3 years ago

I like having sub-range binders for required identifiers too. However, a more immediate problem is that sub-range information does not exist for required identifiers. Those binding informations are syntax properties attached by the macros struct (and define-logger, etc.) in the defining module just like 'disappeared-use and 'disappeared-binding.

rfindler commented 3 years ago

I like having sub-range binders for required identifiers too. However, a more immediate problem is that sub-range information does not exist for required identifiers. Those binding informations are syntax properties attached by the macros struct (and define-logger, etc.) in the defining module just like 'disappeared-use and 'disappeared-binding.

I think we could develop a protocol to make this work (by having the defined identifiers actually be bound to macros that leave information at their use site based on whether or not they are in the same module or not, perhaps? Maybe?). If we had a good sense of what we wanted DrRacket to actually show us it would help guide the design, IMO.

greghendershott commented 3 years ago

Although there might be a cool UX for drawing arrows between two files, I'm not immediately sure what that would be.

I think one interesting point is, what are the arrows? They are things to draw on the screen in a GUI, or handle some other way in a TUI. But also, they can be treated as arrows (a.k.a. arcs) in a directed graph. So, even if there is no obvious UI, they can still be useful.

And so how I came across this issue/question, was from exploring how to do reliable multi-file rename. (One thing I've realized is that there are at least two useful graphs. One graph is about definitions, whether local or relating to identifier-binding from-xxx values. Another graph is about "name introductions", including renaming exports and imports, relating to identifier-binding nominal-from-xxx values.)

Anyway that's my main motivation, even though it's not (mostly) about DrRacket UI. I don't know if that helps guide the design at all?

greghendershott commented 3 years ago

p.s. A simpler point here: It would be nice if jump-to-definition took you to the struct id or the field id, depending where you clicked within the sub-range binders. And that it worked like that consistently, whether the struct definition is local or imported.

rfindler commented 3 years ago

Ah, yes! Multi-file rename as the motivation is very clarifying, thank you. More than happy to add stuff to support that one!

greghendershott commented 3 years ago

Thinking out loud:

The general issue is that, when analyzing require.rkt, desirable information is available only from define.rkt.

Available how/when? Some specific tactics I can imagine:

Punt. Accept that the information isn't available until you also analyze define.rkt. (This is how jump to definition works. From identifier-binding we know that the defining modpath is define.rkt. But to find the specific definition position within, we must analyze define.rkt now, or have previously analyzed and cached this.)
Communicate via macros (as @rfindler outlined).
Communicate via a special submodule. Like configure-runtime, but this is not for run-time, it's for for analysis-time (probably don't abbreviate that). But I'm not sure exactly how this would/could work.
Other??

In the context of implementing a multi-file rename command, in a system that already uses a separate database? The first option is actually not so bad. You have to chase down the entire import graph to find its introduction site (a definition, or a renaming import or export), then fan back out through the graph to find all uses of that name. So you need to analyze e.g. define.rkt anyway. At such time the sub-range binders info could be captured in the database, and, existing arrows can be updated (single arrows can be replaced with two or more arrows). "Now that I've analyzed define.rkt, I see that anything pointing to foo-bar should actually be updated to become separate foo and bar arrows." Something like that.

Again, just thinking out loud.

rfindler commented 3 years ago

My feeling, in retrospect, is that probably we want to choose "punt" where we actually get moving on https://github.com/rfindler/fully-expanded-store . That said, option 3 I haven't really thought deeply about.

samth commented 3 years ago

I think the conventional approach in IDEs is @greghendershott's 1 (which I wouldn't really describe as "punt" -- that sounds more like "don't solve the problem").
Communicating via macros has some costs, such as added expansion time and need to make tools like Typed Racket more complex.

greghendershott commented 3 years ago

I agree. Punt was my lazy way of saying lazy.

greghendershott commented 3 years ago

With the lazy approach:

We analyze require.rkt, and see that there is a single require arrow for s-a. That's all we can know, for now. syncheck:add-arrow and syncheck:add-jump-to-definition work unchanged.
Later, on demand, we analyze define.rkt, and learn that s-a has sub-range binders. I wonder, how should check-syntax report this for define.rkt? Should it still do a single syncheck:add-definition-target for s-a, for backward compatibility? And also do some new method that communicates the whole story about s-a, s and a -- so that a client can, say, update the single arrow/jump for s-a, instead to be two, for s and a?

rfindler commented 3 years ago

I'm not entirely sure, but it sounds like we'll want two different pieces. One piece that just stores the results from files so we can access them quickly without reanalyzing things, but the some changes to the existing setup so that we don't have different information references that are exported from references that are local.

greghendershott commented 3 years ago

I'm still mulling this over. Lately, I'm leaning towards the simplest possible idea: Add a new add-sub-range-binders annotations method. Essentially it would be called whenever traverse encounters a define-values with the sub-range-binders syntax property, supplying that property value.

The upside is that it allows the consumer to handle this however it wants -- including not at all, i.e. the status quo. Some consumers might want to update existing jump-to-definitions ("split" them into two or more). Others might to keep the existing single jumps, and handle the split on-demand (look at the position where the user invoked). Or some other strategy.

The downside being the same: It forces a consumer to decide how to handle this complexity.

However, if there's no clean, non-leaky way to hide this (?), maybe that's the least-worst approach?

Not yet convinced. Just touching base.

rfindler commented 3 years ago

I think it would be best if we make traverse actually save and offer the information without expecting the client to do too too much work to take advantage of it.

It does seem wrong that the same identifier (s-a) gets one arrow when it is coming from a require and two when it is coming from the actual struct. So maybe the way we're conceptualizing the binding information that goes "through" the require is wrong. I think we should not be afraid to change the macro that defines define-struct to start using a new protocol somehow (and we can keep backwards compatibility measures in place almost certainly). Maybe this means that struct-out has to get in on the game. Maybe this means that the expander has to change. Maybe this means that sub-range-binders was done at the wrong level and the expander really should take a more active role in identifiers that are built out of pieces of a macro's input so that check syntax can be just reflecting the information that the expander has instead of managing it itself.

greghendershott commented 3 years ago

I wasn't presuming we could take out such a clean sheet of paper. If we can, that also works!

There is definitely some tension between the perspective that s-a is one name, with one srcloc, for one function... versus the perspective that s and a are two pieces, each with its own srcloc, of a name for one function.

If there were some way to accommodate the latter perspective, even down at the level of identifier-binding, then I think the rest of it would involve simpler relations.

Wishful thinking: A new identifier-bindings (plural) could report that s-a is made of two pieces, making it clear up-front that there should be two arrows. [Super wishful, it could report the precise srcloc of each piece. Less wishful, it could at least report some "key" for each (like {s-a a} and {s-a s} ?) by which the srcloc could be found if/when analyzing the defining file.]

greghendershott commented 3 years ago

I've been thinking about, and working on, this more.

I have a tentative new perspective: It would be fine for check-syntax to keep the existing add-definition-target method exactly as-is, and add a new add-sub-range-bindings method. It's really not that difficult for the consumer to relate the sub-range-binders both to arrows and to definition targets. For example when using a database, a simple join causes the single/full definition target to "fan out" to become multiple arrows, one for each of the sub-range binders.

Admittedly, the answer for require.rkt will differ depending on whether define.rkt has already been analyzed (do we know about the sub-range-binders, yet).

However that is already the case for add-definition-target and jumping to "normal" definitions. The add-jump-to-definition method simply says, "Here is the file to look inside", and you can't do so precisely until you have an add-definition-target for the defining file. Sub-range binders don't introduce this issue, they're just another wrinkle on it. You get an approximate answer until/unless you analyze the other file.

It seems to me the crux of this is the choice: Do you want to force expanding and (at least somewhat) analyzing potentially all transitive dependencies of a file, before you can answer any questions about it?

To-date the answer has been, no. For speed, it is good to delay and be "lazy". Maybe the UX in DrRacket could be improved, i.e. it should automatically analyze the defining file when you want to jump to a definition within --- but it could be improved without changing the basic decision to use a "lazy" approach.
The answer, yes, is also interesting. Maybe the expander and/or a store of expanded syntax could eagerly obtain some of the information. For example if the syntax is already expanded, it is probably pretty cheap to walk it looking for define-values/define-syntax, and do the add-definition-target and add-sub-range-binders business. e.g. Do only that, eagerly, without doing the whole other check-syntax analysis.

In short, I think either approach can work fine. Either incrementally enhance the status quo. Or the "cleaner sheet of paper" idea.

rfindler commented 3 years ago

I think we should imagine that we will soon (soon relative to the lifespan of drracket, anyway) be in a world where we routinely cache the results of check syntax analyzing already-compiled files. (Does this help with this particular decision?) I can also imagine that the simple join you mention could be hiding behind an API call too, when the information is available, making it even simpler for clients. I think this means that continuing with the lazy approach will work well.

racket / drracket

require arrows, sub-range-binders, and add-definition-target #484

Status quo

Instead?