`racket--hash-lang-lookup-pair` can be confused by wrong classification

usaoc commented 11 months ago

A minimal example that shows the problem:

#lang racket/base
|
"foo"

Suppose point is at |; inserting " causes the classfication of string because, apparently, a “string” can be composed of the inserted double quote and the following double quote.

greghendershott commented 11 months ago

Maybe I need more coffee but I'm not understanding.

I'll try to guess the recipe from your example.

With racket-hash-lang-mode on, and any pair minor mode (paredit, smartparens, electric-pair-mode, etc.) off:

#lang racket/base

I type | and get

#lang racket/base
||

with point between the the two |s. Lexed as symbol.

Then I type " and get

#lang racket/base
|""|

with point between the two "s. Also lexed as symbol.

All of which is the expected (to me) result.

Are you taking different steps, and/or do you have a different expected result?

(There may well be a problem, I just don't understand it yet.)

usaoc commented 11 months ago

The | is just to indicate where the point is, not an actual character in the buffer. You’ll also need a string (or just a double quote) following the point. (Edit: anywhere following the point, really.)

greghendershott commented 11 months ago

Thanks! I understand now.

(racket lang's drracket:quote-matches includes | as well as ". (The "quote" in "quote-matches" just means "a delimiter that both opens and closes", not necessarily anything to do with string tokens.) So I had misunderstood your | to be that, not the popular convention to mean "point is here". :smile:)

Stepping back, my motivation here was really just to avoid ' auto-pairing in textual places like {string comment text} tokens. That's annoying in English prose, anyway.

I think I should probably redesign this to be a bit more focused on that problem (while extensible for similar problems).

Probably I should supply a "predicate" function variable, the default value of which is a function that handles this "apostrophe problem", and could be extended for other delimiters and tokens as necessary. Sketch:

modified   racket-hash-lang.el
@@ -686,54 +686,47 @@ However other users don't need that, so we supply this
 (defvar-local racket-hash-lang-pairs nil
   "Pairs of characters to insert automatically.

-The format of each item is
-
-  (open-char close-char . except-kinds)
-
-where except-kinds are symbols corresonding to lang lexer token
-kinds.
+The format of each item is (cons open-char close-char).

 This is initialized whenever a module language changes, using
 single-character values from the language's reported
 drracket:paren-matches and drracket:quote-matches.

-Paren matches are allowed for all token kinds. Quote matches are
-_not_ allowed in string, comment, or text tokens. For example a
-lang might supply \\=' as a quote-match, and you wouldn't want to
-type don\\='t in prose but get don\\='t\\='.
-
 You may customize this default initialization in
 `racket-hash-lang-module-language-hook'.")

+(defvar-local racket-hash-lang-pairs-predicate
+  #'racket-hash-lang-pairs-predicate-default)
+(defun racket-hash-lang-pairs-predicate-default (char pos)
+  (not
+   (and (eq char ?')
+        (pcase-let ((`(,_beg ,_end (,kind . ,_))
+                     (racket-hash-lang-classify (1- pos))))
+          (memq kind '(string comment text))))))
+
+(defun racket-hash-lang-classify (pos)
+  (racket--cmd/await nil
+                     `(hash-lang
+                       classify
+                       ,racket--hash-lang-id
+                       ,racket--hash-lang-generation
+                       ,pos)))
+
 (defun racket--hash-lang-configure-pairs (paren-matches quote-matches)
   (let ((pairs nil))
-    (dolist (p paren-matches)
-      (let ((open (car p))
-            (close (cdr p)))
-        (when (and (= 1 (length open)) (= 1 (length close)))
-          (push (list (aref open 0) (aref close 0))
-                pairs))))
-    (dolist (q quote-matches)
-      (when (= 1 (length q))
-        (push (list (aref q 0) (aref q 0) 'string 'comment 'text)
-              pairs)))
+    (cl-flet ((add (open close)
+                   (when (and (= 1 (length open)) (= 1 (length close)))
+                     (push (cons (aref open 0) (aref close 0))
+                           pairs))))
+      (dolist (p paren-matches) (add (car p) (cdr p)))
+      (dolist (q quote-matches) (add q q)))
     (setq-local racket-hash-lang-pairs (reverse pairs))))

 (defun racket--hash-lang-lookup-pair ()
   (pcase (assq last-command-event racket-hash-lang-pairs)
-    (`(,open ,close . ,except-kinds)
-     (if except-kinds
-         (pcase-let ((`(,_beg ,_end (,kind . ,_))
-                    (racket--cmd/await
-                     nil
-                     `(hash-lang
-                       classify
-                       ,racket--hash-lang-id
-                       ,racket--hash-lang-generation
-                       ,(1- (point))))))
-         (unless (memq kind except-kinds)
-           (cons open close)))
-       (cons open close)))))
+    (`(,open . ,close)
+      (when (funcall racket-hash-lang-pairs-predicate open (point))
+        (cons open close)))))

 (defun racket-hash-lang-will-use-region ()
   "A value for hook `self-insert-uses-region-functions'."

usaoc commented 11 months ago

I think adding customization there is a good idea. Back to the original problem, though, the issue is that the point after insertion is after the inserted character, potentially an opening delimiter, so (1- (point)) can produce a wrong classification; from a simple try, the fix is simply to use (- (point) 2) (we want to classify the “enclosing” token, after all, right?). I hope there isn’t any drawback with doing that.

greghendershott commented 11 months ago

In the diff above (which ~= commit f243b9d on a not-yet-merged topic branch), your reported example using " becomes N/A because now " unconditionally auto-pairs everywhere.

In fact so does everything, unless a predicate function says "nope".

The only case the predicate function handles now is '. It seeks to avoid inserting a matching ' in the context of a textual (string, comment, or text) token.

In that case it is in fact correct to look at (1- (point)).

When the post-self-insert-hook is called, the " is already inserted and (point) is after it.

Asking for the classification at (1- (point)), i.e. what the lang lexer says after refreshing, gives a token kind of:

something like parenthesis for rhombus or constant for racket, because the ' is outside any textual token, OR
a textual token.

And so the predicate function can know to say "yeah auto-pair" in the former case and "nope" in the latter.

p.s. ' isn't a quote-matches value for lang racket, so the predicate won't be called at all in that case. I didn't mean to imply that ' would auto-pair in lang racket; it doesn't. I'm just saying if the predicate were called for that, it would still do the right thing -- only veto textual contexts.

usaoc commented 11 months ago

Thanks for the clarification; sounds reasonable to me.

greghendershott / racket-mode

`racket--hash-lang-lookup-pair` can be confused by wrong classification #684