`{backward forward}-match` differences, and, at-exp indenter needs `get-character` and `find-up-sexp` methods

racket / expeditor

Other

9 stars 15 forks source link

`{backward forward}-match` differences, and, at-exp indenter needs `get-character` and `find-up-sexp` methods #7

Open greghendershott opened 2 years ago

greghendershott commented 2 years ago

Background: I have a variant copy (for the time being) of like-text% from object.rkt. It uses my tokenizer strategy that can stop early on "convergence". As we discussed if it seems suitable for expeditor I can contribute that back here.

Looking at {backward forward}-match docs and code more carefully, I felt less confident I understand what they're supposed to do. The official docs are a bit terse. (On https://github.com/racket/drracket/issues/527 I think part of my confusion was mistaking these for something like {backward forward}-sexp.)

As a sanity check I wrote a test to create my like-text% object, also create a racket:text% object and call start-colorer. Then compare calling forward-match and backward-match on both methods. Right away I'm seeing they're not always equivalent.

Is this a bug, or, maybe as-intended?

Was maybe your idea was that, not only can a like-text% be some documented subset of all color-text<%> methods, for indenters, maybe it's also OK if some methods differ in behavior, if it doesn't matter for an indenter??

So next I added a test for that: Compare giving both objects to an indentation function. The only non-#f-returning indentation function I know about is the scribble/at-exp one. So I used that. Right away, I discovered it errors wanting two more methods available:

get-character. This is trivial.
find-up-sexp. The racket:text% implementation of find-up-sexp seems to use a fair number of other methods defined there, as well as inherited (like backward-containing-sexp).

Does it make sense to start copying equivalents of those for a like-text%, too??

Or is that too much, some of it should be split off into its own interface/class that can be shared among racket:text% and "like-text%" objects in expeditor or other users??

I don't have a crisp picture of how all the status quo pieces fit together, and I'm super weak on racket/class, so unfortunately this is a question not a suggestion.

greghendershott commented 2 years ago

This won't work 100% as-is with object.rkt. test-create is making the like-object% and -get-indenter returns a previously get-infoed drracket:indentation. Let me know if this is enough to give the idea, or, if you'd like me to adapt to work as-is with object.rkt.

  (require racket/gui/base
           framework)
  (define (insert t str)
    (define lp (send t last-position))
    (send t insert str lp lp)
    (send t freeze-colorer)
    (send t thaw-colorer)
    (send t freeze-colorer))

(let ()
    (define str "#lang at-exp racket\n(1) #(2) #hash((1 . 2))\n@racket[]{\n#(2)\n}\n")
    (define o (test-create str))
    (define t (new racket:text%))
    (send t start-colorer symbol->string default-lexer default-paren-matches)
    (insert t str)
    (check-equal? (send o last-position)
                  (send t last-position))

    ;; Test that our implementations of {forward backward}-match are
    ;; equivalent to those of racket:text%.
    (define lp (string-length str))
    ;; FIXME: These tests currently mostly fail. I'm not yet sure if
    ;; that's because mflatt did a subset of behavior needed by
    ;; indenters, or due to some other problem.
    #;
    (for ([pos (in-range 0 (string-length str))])
      (send t set-position pos pos)
      (check-equal? (send o forward-match pos lp)
                    (send t forward-match pos lp)
                    (format "forward-match ~v" pos))
      (check-equal? (send o backward-match pos lp)
                    (send t backward-match pos lp)
                    (format "backward-match ~v" pos)))

    ;; Test that we supply enough entire color-text% methods, and that
    ;; they behave equivalently to from racket-text%, as needed by a
    ;; lang-supplied drracket:indentation a.k.a. determine-spaces
    ;; function. (After all, this is the motivation to provide
    ;; text%-like methods; otherwise we wouldn't bother.)
    ;; FIXME: These errored until I added `get-char` and `find-up-sexp`
    ;; methods. Now they merely fail because my `find-up-sexp` is a
    ;; dummy placeholder that's ~= add1.
    (define determine-spaces (send o -get-line-indenter))
    (for ([pos (in-range 0 (string-length str))])
      (check-equal? (determine-spaces o pos)
                    (determine-spaces t pos)
                    (format "~v ~v" determine-spaces pos))))

rfindler commented 2 years ago

I think we made a decision earlier but I wasn't sure so apologies if this message is adam-and-eveish. It seems like there is a fork in the road: we can either 1) try to detangle color:text from the rest of the GUI libraries because we like it and then reuse it here, 2) we can abandon it and use Greg's coloring code instead, or 3) we can try to keep the two implementations because they have different properties (and then build test suites to help ensure the top-level operations) always do the same things.

From the earlier conversation, my impression was that Greg's code was faster overall for the coloring part but it wasn't incremental so the very slow cases were too slow to be blocking the eventspace handling thread. Does this mean that we should be taking option 3 in the previous paragraph? There are definitely different setups here because, in the Emacs world, there is already a required async call between the keystroke in Emacs and the racket code that computes based on it, but that isn't required in the DrRacket world. Maybe that's enough of a game changer that option 3 is the only one that makes sense?

Thanks.

greghendershott commented 2 years ago

To the extent I might be "losing the plot", I apologize. I'm a little overwhelmed at the moment from

reloading my brain from things I worked on a year ago
understanding the recent changes in expeditor and dracket
un-loading things from a year ago that no longer apply :smile:
sometimes overlooking a few more things to un-load (sorry!)
navigating the framework world of classes/interfaces/mixins, a facet of Racket where I have very little hands-on mileage

rfindler commented 2 years ago

Not at all! My thinking now is that we need to keep the two implementations (so, option 3) but I just wanted to make sure you were in agreement. Absent other specifics, I see option 3 as the worst option (maintaining two implementations) and I really like what you're doing which pushes me towards option 2, but my sense is that the use cases are different enough and the performance characteristics that make sense are different enough that we're going to, at least for now, stick with option 3. Does that make sense to you?

greghendershott commented 2 years ago

As warmup, a report of latest on the Racket Mode side:

I think the situation wrt coloring is fairly straightforward, given some recent changes I made; latest commit. Emacs can send over edit updates asynchronously, it no longer need wait for a response. The Racket Mode back end now tokenizes on a dedicated thread, sending new tokens to a channel, which eventually burble back to Emacs as "notifications". Emacs uses these to re-color "lazily". I need to think/test it more, but so far it's fast even immediately after opening a 5,000 line files. (Things toward the end remain uncolored for awhile, which I think is as-expected.)

TL;DR: currently, what I'm doing is both "lazy/incremental" (new, prompted by you on that other issue thread), plus it stops early upon merely-shifted token "convergence" (which it was already doing).

Also, tokenizing/coloring (in my implementation) needs nothing from racket:text%, so the current issue I'm reporting here is N/A.
The situation is trickier wrt indent.
- Indent is inherently synchronous. What I've been experimenting with so far, is having the Emacs front end maintain a monotonically increasing "generation" number that it increments for every edit update it sends to the back end. And when it queries (e.g. for indent amount), it supplies its latest generation number, as well as a position. The back end blocks until tokenizing has progressed through that generation number, and through that position. I think this is sound... but I'm still thinking.
- drracket:indentation functions need a racket:text% or "text-like" as their first argument. I was pleasantly surprised to see expeditor's object.rkt. I didn't realize you could "fake" a class subset like that. Even better, the number of methods is so small! But just how small, is the question I'm asking about in this issue.

I don't know if my recent changes I described above, make you any more inclined to do option 2? Certainly I'm not pitching/selling/urging you. For Dr Racket it might be fine to make no change, or at most to use the idea of stopping early for merely-shifted tokens.

Anyway I think option 2 is really about coloring. I think the crux of the challenge is indentation functions needing racket:text% (or similar), and that's only really addressed by options 1 vs. 3.

Option 1 is of course the ideal, "correct" thing to do. In a world where "correct" is work done by someone else and causes no bugs to fix. :smile: Seriously, it does seem ideal, to me, but you would probably be doing most of that work, not me, so I feel like my opinion isn't the important one.

I think option 3 might be OK? It wouldn't be awful, today, to do some copypasta. I'm slightly nervous the number of methods will keep growing over time, maybe to where option 1 will look good in hindsight? But I don't know.

mflatt commented 2 years ago

The forward-match and backward-match methods seem to me at the same level as classify-position, and so sensible to implement separately but providing the same behavior. From a quick look, I would guess that whitespace skipping is one difference that needs to be fixed, but your test will probably show other things.

For things like find-up-sexp and backward-containing-sexp that are really specific to S-expressions, I think it would be worthwhile to extract the implementations — keeping the methods also in the framework classes for backward compatibility, but having the implementations use the extracted variants. I would be happy to work on that and use it to clean up expeditor, but it will be a couple of days before I get to it.

rfindler commented 2 years ago

@greghendershott thank you for the thoughful analysis. Without too much quoting, here are some responses

Option 1 is of course the ideal, "correct" thing to do.

I think this might be true if the performance of option 2 was similar or worse but my experiments suggest that option 2's performance was better. I am not confident in that, however, and would be interested in working together to try to get a better handle on the relative performance.

What I've been experimenting with so far, is having the Emacs front end maintain a monotonically increasing "generation" number that it increments for every edit update it sends to the back end. And when it queries (e.g. for indent amount), it supplies its latest generation number, as well as a position

The framework's colorer has a similar capability. Internally, it knows a region of the buffer that's wrong -- the back end of this region moves forward as the colorer reports tokens until it shortcircuits (as your code does) or reaches the end. When someone does an indent (or need to know where a matching delimiter is), the event handling thread just until the colorer has progressed far enough that it has the answer (it could also do a check and beep instead that wouldn't block and I vaguely recall this option is used somewhere, but it isn't used for indent nor is it used for the editor navigation keystrokes (eg forward-sexp); those block instead).

As for drracket:indentation, I'd be up for a new interface that is more suited to being cross-IDE. I think the approach you and Matthew are taking here of moving towards replacing the type of the first argument with something smaller is great and assuming that works out, drracket can totally get on board, in any of the three worlds. Sorry that these questions are somewhat off topic for this issue.

greghendershott commented 2 years ago

The forward-match and backward-match methods seem to me at the same level as classify-position, and so sensible to implement separately but providing the same behavior. From a quick look, I would guess that whitespace skipping is one difference that needs to be fixed, but your test will probably show other things.

I think that's one difference. I took a crack at defining skip-whitepace myself, and couldn't get it right from the documentation. Finally I cheated by looking at the reference implementation in color.rkt, and came up with this equivalent that passes a test compared to the framework original:

    (define/public (skip-whitespace pos direction comments?)
      (let loop ([pos pos])
        (cond [(and (eq? direction 'forward)
                    (>= pos (last-position)))
               pos]
              [(and (eq? direction 'backward)
                    (<= pos 0))
               pos]
              [else
               (define token-pos (if (eq? direction 'forward) pos (sub1 pos)))
               (define-values (s e) (get-token-range token-pos))
               (define type (classify-position token-pos))
               (cond [(or (eq? type 'white-space)
                          (and comments? (eq? type 'comment)))
                      (loop (if (eq? direction 'forward)
                                e
                                s))]
                     [else pos])])))

Another probable difference: My versions of get-token-range and classify-position do an add1 of the position. In other words, lexers work with 1-based positions (which IIUC your hash-map uses, as does my interval-map). However the text% interfaces use 0-based positions.

So hopefully that's a tiny head start when you have a chance to look at this in >= a few days.

Even with those changes, where my skip-whitespace validates, using it in backward-match and forward-match still doesn't cause those to pass their tests.

I think the problem here is that they're documented to stop at non-parens, too:

 (send a-color:text backward-match position cutoff)       
 → (or/c exact-nonnegative-integer? #f)  
  position : exact-nonnegative-integer?  
  cutoff : exact-nonnegative-integer?  
Skip all consecutive whitespaces and comments (using skip-whitespace) immediately preceding the position. If the token at this position is a close, return the position of the matching open, or #f if there is none. If the token was an open, return #f. For any other token, return the start of that token.
 (send a-color:text forward-match position cutoff)       
 → (or/c exact-nonnegative-integer? #f)  
  position : exact-nonnegative-integer?  
  cutoff : exact-nonnegative-integer?  
Skip all consecutive whitespaces and comments (using skip-whitespace) immediately following position. If the token at this position is an open, return the position of the matching close, or #f if there is none. For any other token, return the end of that token.

Although I don't necessarily understand the rationale for that, it's the current behavior, and I don't know if any indenters depend on it, so it probably needs to be preserved?

Here when I cheat and look at the implementation of {backward forward}-match, it's more challenging (for me) to see how to translate it because (a) it has lexer state updates and so on co-mingled with it, and (b) it uses a separate paren-tree data structure.

[[Whereas I'd prefer (a) to have some outer guard that checks whether it's OK to proceed, and then these individual methods can be simpler, and (b) in my experiments I found iterating tokens is fast enough (even in a big old (module _) style file) that an auxiliary paren tree wasn't necessary. So I'd probably just do a clean sheet implementation... but getting too late to continue with that tonight.]]

Again, I know you can't work on this for at least a few days, but I wanted to share this in case it was helpful later.

rfindler commented 2 years ago

re the rationale for backward and forward match and non-parens: I think this has to do with the alt-movement keys moving over atomic sexpressions. So in a situation where someone is trying to move forward over an expression and that expression is, say, a string, then we get the expected behavior. Here's some calls that try to demonstrate that:

#lang racket
(require framework)

(define t (new racket:text%))
(send t insert (format "~s" "a string"))
(send t freeze-colorer) (send t thaw-colorer) ;; ensure up to date
(send t forward-match 0 (send t last-position)) ;; -> (send t last-position)

the call at the end gets us to the end of the buffer (which is the end of the string).

rfindler commented 2 years ago

Here's my attempt to explain what forward-match is doing (comments annotating the lines above them)

    (define/private (do-forward-match position cutoff skip-whitespace?)
      (let ((position 
             (if skip-whitespace? 
                 (skip-whitespace position 'forward #t)
                 position)))
;; skip the whitespace
        (let ([ls (find-ls position)])
;; the editor might have multiple regions that are colored separately
;; (think: interactions window with each repl input section being colorerd separately
;; and the regions between them (where io and results go) not being colored)
;; this gets the right one.
          (and 
           ls
           (let-values (((start end error)
                         (send (lexer-state-parens ls) match-forward
                               (- position (lexer-state-start-pos ls)))))
;; this asks for the matching position of the paren at position (the subtraction is
;; to deal with these regions the coordinates inside the `ls` are relative to the ls's region)
;; the match-forward method is the one in paren-tree.rkt. 
             (cond
               ((f-match-false-error ls start end error)
;; this is part of the incremental stuff -- it means that we need to ask the code that's
;; interacting with the tokenizer to go get some more tokesn
                (colorer-driver)
;; which is what colorer-driver does
                (do-forward-match position cutoff #f))
;; and then we try again
               ((and start end (not error))
;; here we found a match for the paren, so return it, after coordinate transformations
                (let ((match-pos (+ (lexer-state-start-pos ls) end)))
                  (cond
                    ((<= match-pos cutoff) match-pos)
                    (else #f))))
               ((and start end error) #f)
;; dunno what error is about; this whole thing may be dead
               (else
                (skip-past-token ls position))))))))
;; and here we didn't find a parenthesis so we give the other end of the token (ala my previous comment)

greghendershott commented 2 years ago

@rfindler Thanks for the additional explanation!

;; the editor might have multiple regions that are colored separately
;; (think: interactions window with each repl input section being colorerd separately
;; and the regions between them (where io and results go) not being colored)
;; this gets the right one.

:heavy_check_mark:

What do to in an interactions/REPL buffer is something I'd been deferring, while trying to work out how to use all these #lang supplied features in Emacs edit buffers. I think I've made enough progress that I should start looking at REPLs soon, too.

greghendershott commented 2 years ago

@mflatt I'm not sure if I'm looking at the right repo, is this the latest rhombus implementation, for now, supplying only drracket:submit-predicate?

Asking only because if there were something already supplying e.g. drracket:range-indentation, I wanted to exercise some recent changes which (in theory) ought to work all the way up to the Emacs indent-region command side.

But if not yet (or if it's not quite working due to e.g. this issue I'm commenting on) no worries. I can set it aside; plenty else to work on meanwhile.

rfindler commented 2 years ago

The support in the framework's colorer for the REPL comes in the form of the reset-regions method. Lifting this restriction about editing outside the regions is probably not too difficult (especially given data/interval-map, which didn't exist back when this code was written)

mflatt commented 2 years ago

@greghendershott That get-info mostly defers to the one in shrubbery.

Here's the current version of the shrubbery get-info: https://github.com/mflatt/shrubbery-rhombus-0/blob/master/shrubbery/main.rkt

It points to this implementation of indentation support: https://github.com/mflatt/shrubbery-rhombus-0/blob/master/shrubbery/indentation.rkt

mflatt commented 2 years ago

I've made forward-match and backward-match act like color:text%, and I also added backward-containing-sexp and skip-whitespace on the grounds that those are in color:text%. The find-up-sexp method is in racket:text%, so I didn't add that one, but I changed the at-exp indenter to avoid it.

Tests based on @greghendershott's example are in a new expeditor-test package.

A next step, still, is to extract the default Racket indentation and navigation functions from racket:text% into a separate library.

mflatt commented 2 years ago

@greghendershott Do you have any opinion on where the extracted Racket default indentation and navigation functions should go?

I've started putting them in syntax-color-lib, since they rely on information produced by a coloring lexer (at least in practice). They also rely on a color:text%-like interface, though. I could imagine your like-text% replacement going in syntax-color-lib, or maybe these extracted functions should go in another package that has the like-text% replacement.

greghendershott commented 2 years ago

@mflatt This morning I'm catching up with your changes.

Everything looks good so far, with one small exception: When I add a test exercising shrubbery-determine-spaces using shrubbery-rhombus-0/demo.rkt (which is an excellent example file for tests, thanks!!), it seems to mostly pass except for a small stretch involving shrubbery at-expressions.

That is, determine-spaces seems to say 2 using racket:text% but 4 using like-text%, around lines 583 through 592 of demo.rkt. Before/after all tests pass (yay!).

This after I fetched racket/racket repo and doing a make this morning. So maybe this is related to at-exp indenter changing to avoid find-up-sexp (if shrubbery uses that, idk?) or maybe it's been a bug all along. Until this morning that test failed for me so extensively I couldn't say if that small segment was passing before or not.

NOTE: I'm doing an equivalent test in my "like-text fork". If you add to tests/object.rkt something like (check (file->string "/path/to/shrubbery-rhombus-0/demo.rkt") __), and don't see this, please let me know and I'll try that myself to figure out what's up.

greghendershott commented 2 years ago

@mflatt I'm not sure the best home, and I probably need to finish up a couple things before mentally switching gears to think about packaging and delivery.

(A quick reaction: syntax-color-lib seems like a good choice for the Racket s-exp indent/nav. It also seems like a good home for my like-text% replacement -- it's roughly your like-text% combined with an alternative to token-tree% which lives in syntax-color-lib.)

mflatt commented 2 years ago

@greghendershott Thanks for the report on the mismatch with the shrubbery lexer. The problem there was exposed by a bug in the shrubbery lexer, where the lexer reports a non-parenthesis symbol as a parenthesis. Of course, like-text% should ideally handle buggy lexers the same, so that's now fixed.