minad / jinx

🪄 Enchanted Spell Checker
GNU General Public License v3.0
432 stars 21 forks source link

`font-lock-variable-name-face` exclusion for `tex-mode` #25

Closed gusbrs closed 1 year ago

gusbrs commented 1 year ago

I found at least one case where the font-lock-variable-name-face is used and where arbitrary text is to be expected and, hence, should be spell checked: the optional arguments of citations.

Consider the following document:

\documentclass{article}

\usepackage{biblatex}
\addbibresource{biblatex-examples.bib}

\begin{document}

\cite[See afalsepositive]{sigfridsson}

\end{document}

I'm using AUCTeX. Tested with emacs -Q and the following setup:

(add-to-list 'load-path "~/.emacs.d/elpa/auctex-13.1.10/")
(add-to-list 'load-path "~/.emacs.d/elpa/jinx-0.4.0.20230328.213131/")
(add-to-list 'load-path "~/.emacs.d/elpa/compat-29.1.4.1/")

(require 'latex)
(require 'reftex)
(require 'jinx)
(setq TeX-parse-self t)
(setq TeX-auto-save t)
(add-hook 'LaTeX-mode-hook #'turn-on-reftex)
(add-hook 'LaTeX-mode-hook #'jinx-mode)

And the result:

Screenshot from 2023-03-29 17-38-54

minad commented 1 year ago

We could introduce a custom predicate which checks the face and the context. There are probably more such cases, so we should first compile a list of all the scenarios.

gusbrs commented 1 year ago

There are probably more such cases, so we should first compile a list of all the scenarios.

I presume so, and I agree that collecting cases a while before trying to devise the best approach is a good idea. But this is a tricky list to compile...

minad commented 1 year ago

We can wait for more feedback regarding false positives/negatives.

dr-scsi commented 1 year ago

@minad - A related question is: Is it possible for jinx to support the functionality provided by `ispell-skip-region-alist'? Docstring says:

Alist expressing beginning and end of regions not to spell check.
The alist key must be a regular expression.
Valid forms include:
  (KEY) - just skip the key.
  (KEY . REGEXP) - skip to the end of REGEXP.  REGEXP may be string or symbol.
  (KEY REGEXP) - skip to end of REGEXP.  REGEXP must be a string.
  (KEY FUNCTION ARGS) - FUNCTION called with ARGS returns end of region.

I think it would be great benefit. For the example above, jinx could consult `ispell-tex-skip-alists' which looks like this:

((("\\\\addcontentsline"              ispell-tex-arg-end 2)
  ("\\\\add\\(tocontents\\|vspace\\)" ispell-tex-arg-end)
  ("\\\\\\([aA]lph\\|arabic\\)"       ispell-tex-arg-end)
  ("\\\\cref"                         ispell-tex-arg-end)
  ("\\\\bibliographystyle"            ispell-tex-arg-end)
  ("\\\\makebox"                      ispell-tex-arg-end 0)
  ("\\\\e?psfig"                      ispell-tex-arg-end)
  ("\\\\document\\(class\\|style\\)" . "\\\\begin[ \t\n]*{document}"))
 (("\\(figure\\|table\\)\\*?"         ispell-tex-arg-end 0)
  ("list"                             ispell-tex-arg-end 2)
  ("program"      . "\\\\end[ \t]*{program}")
  ("verbatim\\*?" . "\\\\end[ \t]*{verbatim\\*?}")))

and says: Ignore everything between \documentclass and \begin{document}. \cite can be also easily added:

("\\\\cite" ispell-tex-arg-end)
minad commented 1 year ago

On 4/3/23 13:03, dr-scsi wrote:

@minad https://github.com/minad - A related question is: Is it possible for jinx to support the functionality provided by `ispell-skip-region-alist'? Docstring says:

No, that's not ideal since it involves expensive scanning backward and forward. You could try writing a custom predicate which implements this. I don't plan to add support for Ispell-specific features, except maybe LocalWords.

dr-scsi commented 1 year ago

@minad - Thanks for your response. I do understand your concern, but I must admit I find the feature very powerful for example for skipping the optional argument of environments in LaTeX. An example with flyspell looks like this: image

I was hoping that jinx can combine the various techniques used. Maybe that's a job for Emacs core itself.

gusbrs commented 1 year ago

I do understand your concern, but I must admit I find the feature very powerful for example for skipping the optional argument of environments in LaTeX. An example with flyspell looks like this:

@dr-scsi I can tell a little from my experience. I'm also a LaTeX user, with a heavily customized flyspell setup, particularly with regard to rules for ignoring a number of LaTeX constructs (see https://github.com/minad/jinx/issues/9#issue-1642020083).

jinx is a little different from ispell and flyspell here, because it also uses faces to ignore parts of the buffer. In other words, it exploits the work of done on behalf of syntax highlighting for spell checking. And, from the tests I've done so far, this allows for a significantly smaller number of false positives, at least for an out-of-the-box experience. So, personally, for the moment I'm first trying things out with jinx and see what really needs to be ported to jinx from these skipping rules I had for flyspell. So far, my impression is that I'll be able to achieve similar results with much less setup.

Of course, using faces has the danger of possibly ignoring things that shouldn't be. And that's what this report/issue is about.

Besides, jinx accepts custom predicates already. And if something is still missing in the end, one can always add it as a predicate.

minad commented 1 year ago

Regarding custom predicates - it turned out that we cannot trust the flyspell-mode-predicate to be efficient enough. I had already feared that this may be the case. Therefore Jinx is now completely on its own to handle exclusion. See #35. Generally I'd appreciate more feedback about the current quality of the exclusion based on faces. This particular issue is maybe a bit overly specific. For now I am more interested in a configuration which handles the 90% in various file types.

gusbrs commented 1 year ago

Hi @minad , I'm trying to tackle this one now and, it turns out, this face is particularly tricky. AUCTeX/font-latex.el uses this face, and this face only, for all and every optional argument of macros with arguments, regardless of class or anything else. And it is hard-coded inside font-latex-match-command-with-arguments with no viable way to configure this selectively. Given that most optional arguments are meant to be ignored by spell checking to start with, removing font-lock-variable-name-face from jinx-exclude-faces would mean that a huge load would have to be carried by some other means, presumably regexps. So this is really a tight spot, since not spell checking what should be is arguably very undesirable too.

And, given my last fumbled attempt, this time I went the other way around and tried to leverage the font-locking machinery for the purposes of fine tuning spell checking with jinx. I think you'll like the approach, however I doubt this will be directly useful for jinx, since you probably won't want to internalize limitations of font-locking of this or that major mode. Anyway, it is interesting at least as a proof of concept of how far it is possible to take the use of faces for the purpose.

gb/font-latex-match-biblatex-citation-lists is dedicated to the task of biblatex's citation lists, which are very peculiar. gb/font-latex-match-spellcheck-optargs is more general, and can be used to inhibit font-lock-variable-name-face at the face predicate for any regular macro, even if I'm currently applying it to citation commands only.

There you have it:

(with-eval-after-load 'font-latex
  ;; Fine tuning LaTeX font-lock for better spell checking with Jinx.

  (add-hook 'LaTeX-mode-hook 'gb/jinx-latex-setup)
  (defun gb/jinx-latex-setup ()
    (setq-local font-lock-extra-managed-props
                (append font-lock-extra-managed-props '(gb/jinx--inhibit-faces)))
    (setq-local jinx--predicates
                (list #'gb/jinx--latex-face-ignored-p
                      #'jinx--regexp-ignored-p
                      #'jinx--word-valid-p)))

  (defun gb/jinx--latex-face-ignored-p (start)
    "Return non-nil if face at START of word is ignored."
    (let* ((face (get-text-property start 'face))
           (inhibit (get-text-property start 'gb/jinx--inhibit-faces))
           (jinx--exclude-faces (seq-difference jinx--exclude-faces inhibit)))
      (or
       (and jinx--include-faces
            (if (listp face)
                (cl-loop for f in face never (memq f jinx--include-faces))
              (not (memq face jinx--include-faces))))
       (and jinx--exclude-faces
            (if (listp face)
                (cl-loop for f in face thereis (memq f jinx--exclude-faces))
              (memq face jinx--exclude-faces))))))

  (defvar gb/font-latex-biblatex-citation-lists
    '("cites" "Cites"
      "parencites" "Parencites"
      "footcites" "footcitetexts"
      "smartcites" "Smartcites"
      "textcites" "Textcites"
      "supercites"
      "autocites" "Autocites"
      "volcites" "Volcites"
      "pvolcites" "Pvolcites"
      "fvolcites" "Fvolcites"
      "svolcites" "Svolcites"
      "tvolcites" "Tvolcites"
      "avolcites" "Avolcites"))

  ;; Based on 'font-latex-match-command-with-arguments'.
  (defun gb/font-latex-match-biblatex-citation-lists (limit)
    (setq font-latex-matched-faces nil)
    (let ((regexp (concat "\\\\\\(\\(?:"
                          (regexp-opt gb/font-latex-biblatex-citation-lists)
                          "\\)\\>\\)"))
          (parse-sexp-ignore-comments t)
          (arg-delims '((?\( . ?\)) (?\[ . ?\]) (?{ . ?})))
          ;; AUCTeX/font-latex.el handles the fontification of up to two
          ;; mandatory arguments of citation lists, and that's a "won't fix":
          ;; https://tex.stackexchange.com/a/426066
          (handled-args 2) (curr-arg 1)
          match-data)
      (catch 'match
        (while (re-search-forward regexp limit t)
          (unless (font-latex-faces-present-p '(font-lock-comment-face
                                                font-latex-verbatim-face)
                                              (match-beginning 0))
            (goto-char (match-end 0))
            (catch 'break
              (while (assq
                      (save-excursion
                        (while (and (not (eobp)) (font-latex-forward-comment)))
                        (char-after))
                      arg-delims)
                (while (and (not (eobp)) (font-latex-forward-comment)))
                (let* ((match-beg (point))
                       (char (char-after))
                       (closedp (font-latex-find-matching-close
                                 char (cdr (assq char arg-delims)))))
                  (cond ((not closedp)
                         (throw 'break nil))
                        ((eq char ?{)
                         (when (> curr-arg handled-args)
                           (push (1- (point)) match-data)
                           (push (1+ match-beg) match-data)
                           (push 'font-lock-constant-face font-latex-matched-faces))
                         (setq curr-arg (1+ curr-arg)))
                        ((memq char '(?\( ?\[))
                         (push (1- (point)) match-data)
                         (push (1+ match-beg) match-data)
                         (if (> curr-arg handled-args)
                             (push '(face font-lock-variable-name-face
                                          gb/jinx--inhibit-faces
                                          (font-lock-variable-name-face))
                                   font-latex-matched-faces)
                           (push '(face nil
                                        gb/jinx--inhibit-faces
                                        (font-lock-variable-name-face))
                                 font-latex-matched-faces)))))))
            (when match-data
              (setq font-latex-matched-faces (nreverse font-latex-matched-faces))
              (store-match-data (nreverse match-data))
              (throw 'match t)))))))

  (font-lock-add-keywords
   'latex-mode
   `((gb/font-latex-match-biblatex-citation-lists
      .
      ,(let (faces)
         ;; 2 global notes, plus 12 full sets of 3
         (dotimes (i 38)
           (push `(,i (font-latex-matched-face ,i) append t) faces))
         (nreverse faces)))))

  ;; Optional arguments will set the 'gb/jinx--inhibit-faces' property, unless
  ;; preceded by "!".
  (defvar gb/font-latex-spellcheck-optargs
    '(("cite" "*[[{") ("Cite" "[[{")
      ("parencite" "*[[{") ("Parencite" "[[{")
      ("footcite" "[[{") ("footcitetext" "[[{")
      ("textcite" "[[{") ("Textcite" "[[{")
      ("smartcite" "[[{") ("Smartcite" "[[{")
      ("supercite" "{")
      ("autocite" "*[[{") ("Autocite" "*[[{")
      ("citeauthor" "*[[{") ("Citeauthor" "*[[{")
      ("citetitle" "*[[{")
      ("citeyear" "*[[{") ("citedate" "*[[{")
      ("citeurl" "[[{")
      ("fullcite" "[[{") ("footfullcite" "[[{")
      ("volcite" "[{![{") ("Volcite" "[{![{")
      ("pvolcite" "[{![{") ("Pvolcite" "[{![{")
      ("fvolcite" "[{![{") ("ftvolcite" "[{![{")
      ("svolcite" "[{![{") ("Svolcite" "[{![{")
      ("tvolcite" "[{![{") ("Tvolcite" "[{![{")
      ("avolcite" "[{![{") ("Avolcite" "[{![{")
      ("notecite" "[[{") ("Notecite" "[[{")
      ("pnotecite" "[[{") ("Pnotecite" "[[{")
      ("fnotecite" "[[{")
      ("citename" "[[{![{") ("citelist" "[[{![{") ("citefield" "[[{![{")))

  (defun gb/font-latex-match-spellcheck-optargs (limit)
    (setq font-latex-matched-faces nil)
    (let* ((macros (mapcar #'car gb/font-latex-spellcheck-optargs))
           (regexp (concat "\\\\\\(\\(?:" (regexp-opt macros) "\\)\\>\\)")))
      (catch 'match
        (while (re-search-forward regexp limit t)
          (unless (font-latex-faces-present-p '(font-lock-comment-face
                                                font-latex-verbatim-face)
                                              (match-beginning 0))
            (let ((spec-list (string-to-list
                              (cadr (assoc (match-string 1)
                                           gb/font-latex-spellcheck-optargs))))
                  (parse-sexp-ignore-comments t)
                  spec match-data match-beg ignore-p)
              (goto-char (match-end 0))
              (when (or (eq (car spec-list) ?*)
                        (eq (car spec-list) ?+))
                (setq spec-list (cdr spec-list))
                (skip-chars-forward "*+" (1+ (point))))
              (catch 'break
                (while spec-list
                  (setq spec (pop spec-list))
                  (when (eq spec ?!)
                    (setq ignore-p t)
                    (setq spec (pop spec-list)))
                  (while (and (not (eobp)) (font-latex-forward-comment)))
                  (cond ((eq spec ?{)
                         (unless (and (eq (char-after) spec)
                                      (font-latex-find-matching-close ?{ ?}))
                           (throw 'break nil)))
                        ((eq (char-after) spec)
                         (setq match-beg (point))
                         (if (font-latex-find-matching-close
                              spec (cdr (assq
                                         spec
                                         font-latex-command-with-args-opt-arg-delims)))
                             (unless ignore-p
                               (push (1- (point)) match-data)
                               (push (1+ match-beg) match-data)
                               (push '(face nil
                                            gb/jinx--inhibit-faces
                                            (font-lock-variable-name-face))
                                     font-latex-matched-faces))
                           (throw 'break nil))))))
              (when match-data
                (setq font-latex-matched-faces (nreverse font-latex-matched-faces))
                (store-match-data (nreverse match-data))
                (throw 'match t))))))))

  (font-lock-add-keywords
   'latex-mode
   `((gb/font-latex-match-spellcheck-optargs
      .
      ,(let (faces)
         (dotimes (i 5)
           (push `(,i (font-latex-matched-face ,i) append t) faces))
         (nreverse faces))))))

Just out of the oven, so lightly tested. But results seem pretty good.

Screenshot from 2023-04-17 16-17-20

minad commented 1 year ago

My approach to such issues is usually to create a package. Why not create a package tex-enhanced-face-lock which adds additional font locking. Alternatively propose this to AucTeX. Then you could configure Jinx in your configuration to take advantage of the enhanced font locking.

minad commented 1 year ago

Let's close this one, since Jinx is unlikely to get deeper into these waters. A separate package seems like a good solution.

gusbrs commented 1 year ago

My approach to such issues is usually to create a package. Why not create a package tex-enhanced-face-lock which adds additional font locking. Alternatively propose this to AucTeX. Then you could configure Jinx in your configuration to take advantage of the enhanced font locking.

Well, for now it is a couple of functions in my init to handle use cases important to me. And I don't think this makes for a package for "TeX enhanced font lock", this is leveraging font-lock to assist Jinx in its task. Also I doubt it would make sense upstream. Indeed, for the case of gb/font-latex-match-biblatex-citation-lists I already have a "won't fix" from Arash: https://tex.stackexchange.com/a/426066. But, granted, it does not fit well in Jinx either... As I said, just a proof of concept of how far the approach can go. :-)

EDIT: Btw, this approach is also very promising to add support for something like https://github.com/minad/jinx/issues/15, if one day you decide to go that route.

gusbrs commented 1 year ago

Having dwelt a whole day yesterday in font-latex.el, I just wrote the following summary notes of my findings which may be helpful / useful:

  ;; The way font-lock works in AUCTeX, as done by 'font-latex.el', poses some
  ;; challenges to the face based predicate for ignoring (or not) certain
  ;; parts of the buffer used by Jinx.
  ;;
  ;; 1. All optional arguments are fontified with a single face,
  ;; 'font-lock-variable-name-face', for all classes and all commands, and
  ;; that's hard-coded inside 'font-latex-match-command-with-arguments' with
  ;; no viable way to selectively control it.  Even if arguably most optional
  ;; arguments are meant to be ignored, not all of them are, and there is no
  ;; way to distinguish the cases based on faces alone.
  ;;
  ;; 2. For each keyword class, and hence for every command in the class, a
  ;; single face is applied to all mandatory arguments.  Therefore, if a macro
  ;; has one argument which should be spell checked and another that should
  ;; not, there is no way to distinguish them based on faces alone.
  ;;
  ;; 3. Since the keyword classes are populated with criteria other than spell
  ;; checking in mind it is not always easy to decide if a class of commands
  ;; should have its arguments ignored or not.  Take the "reference" class,
  ;; for example, which includes most cross-referencing commands, whose
  ;; arguments are labels, but also "\footnote", "\footnotetext" and
  ;; "\marginpar".  Besides, some keyword classes share the same face.
  ;;
  ;; 4. Given that (currently) the face predicate is called first by Jinx, at
  ;; the time the regexp predicate is called, point is already inside the
  ;; argument, meaning a regexp for the whole macro won't match, and which
  ;; macro that argument comes from can only be found out by expensive
  ;; backwards searching.
  ;;
  ;; To some extend, number 3 can be dealt with by tweaking font-lock with
  ;; regular tools ('font-latex-add-keywords' etc.), possibly requiring
  ;; personal style files in many cases.  How much this is a meaningful way to
  ;; deal with the problem, I'm not sure.
  ;;
  ;; A reordering of the ignore predicates may help with 4, but it doesn't
  ;; fully solve the problem either.  Because it is easy to use a regexp to
  ;; ignore a command which the faces predicate would not, but not the other
  ;; way around.  So, if you have a command using a face which is part of
  ;; 'jinx-exclude-faces' but should be spell checked, the only way to deal
  ;; with it would be to remove the face altogether from 'jinx-exclude-faces',
  ;; and then handling all the other commands which use that face be ignored
  ;; by regexps.  Not an appealing route.
  ;;
  ;; But numbers 1 and 2 remain impossible to tackle with current machinery
  ;; since there's no way to distinguish the parts of a regexp what should be
  ;; spell checked from those that shouldn't.

Also, imho, even if you have deemed my attempts to handle these either misguided or too complicated, it doesn't make the problem go away.

You may say that:

This particular issue is maybe a bit overly specific.

I beg to disagree. The problem is deep seated in the font-latex.el machinery. Unless, of course, you meant "LaTeX" by that. But even if Jinx sees LaTeX as a "corner case", which I'm not sure it is, that still depends on how important that case is for individual users.

minad commented 1 year ago

Also, imho, even if you have deemed my attempts to handle these either misguided or too complicated, it doesn't make the problem go away.

Of course the problem goes away for me by deeming something out of scope of Jinx. :)

I think handling these TeX arguments is an edge case and we cannot solve this here if the existing TeX machinery is lacking. At least the scenario here doesn't introduce false positives.

gusbrs commented 1 year ago

Of course the problem goes away for me by deeming something out of scope of Jinx. :)

I think handling these TeX arguments is an edge case and we cannot solve this here if the existing TeX machinery is lacking.

Ok, understood.

At least the scenario here doesn't introduce false positives.

It does though. This issue was opened demonstrating the existence of a whole class of them.

minad commented 1 year ago

It does though. This issue was opened demonstrating the existence of a whole class of them.

You mean false negatives or did I misunderstand?

gusbrs commented 1 year ago

You mean false negatives or did I misunderstand?

Hard to know what is type I and type II in this context. I meant words that should be flagged as misspellings and which are not.

minad commented 1 year ago

Yes, these are false negatives. This issue is about false negatives in the default configuration. If you remove the face in your configuration you will get a lot of false positives.

gusbrs commented 1 year ago

Yes, these are false negatives.

Assuming the test is "is the word a misspelling?", yes.