Closed gusbrs closed 1 year ago
We could introduce a custom predicate which checks the face and the context. There are probably more such cases, so we should first compile a list of all the scenarios.
There are probably more such cases, so we should first compile a list of all the scenarios.
I presume so, and I agree that collecting cases a while before trying to devise the best approach is a good idea. But this is a tricky list to compile...
We can wait for more feedback regarding false positives/negatives.
@minad - A related question is: Is it possible for jinx to support the functionality provided by `ispell-skip-region-alist'? Docstring says:
Alist expressing beginning and end of regions not to spell check.
The alist key must be a regular expression.
Valid forms include:
(KEY) - just skip the key.
(KEY . REGEXP) - skip to the end of REGEXP. REGEXP may be string or symbol.
(KEY REGEXP) - skip to end of REGEXP. REGEXP must be a string.
(KEY FUNCTION ARGS) - FUNCTION called with ARGS returns end of region.
I think it would be great benefit. For the example above, jinx could consult `ispell-tex-skip-alists' which looks like this:
((("\\\\addcontentsline" ispell-tex-arg-end 2)
("\\\\add\\(tocontents\\|vspace\\)" ispell-tex-arg-end)
("\\\\\\([aA]lph\\|arabic\\)" ispell-tex-arg-end)
("\\\\cref" ispell-tex-arg-end)
("\\\\bibliographystyle" ispell-tex-arg-end)
("\\\\makebox" ispell-tex-arg-end 0)
("\\\\e?psfig" ispell-tex-arg-end)
("\\\\document\\(class\\|style\\)" . "\\\\begin[ \t\n]*{document}"))
(("\\(figure\\|table\\)\\*?" ispell-tex-arg-end 0)
("list" ispell-tex-arg-end 2)
("program" . "\\\\end[ \t]*{program}")
("verbatim\\*?" . "\\\\end[ \t]*{verbatim\\*?}")))
and says: Ignore everything between \documentclass
and \begin{document}
. \cite
can be also easily added:
("\\\\cite" ispell-tex-arg-end)
On 4/3/23 13:03, dr-scsi wrote:
@minad https://github.com/minad - A related question is: Is it possible for jinx to support the functionality provided by `ispell-skip-region-alist'? Docstring says:
No, that's not ideal since it involves expensive scanning backward and forward. You could try writing a custom predicate which implements this. I don't plan to add support for Ispell-specific features, except maybe LocalWords.
@minad - Thanks for your response. I do understand your concern, but I must admit I find the feature very powerful for example for skipping the optional argument of environments in LaTeX. An example with flyspell
looks like this:
I was hoping that jinx can combine the various techniques used. Maybe that's a job for Emacs core itself.
I do understand your concern, but I must admit I find the feature very powerful for example for skipping the optional argument of environments in LaTeX. An example with
flyspell
looks like this:
@dr-scsi I can tell a little from my experience. I'm also a LaTeX user, with a heavily customized flyspell
setup, particularly with regard to rules for ignoring a number of LaTeX constructs (see https://github.com/minad/jinx/issues/9#issue-1642020083).
jinx
is a little different from ispell
and flyspell
here, because it also uses faces to ignore parts of the buffer. In other words, it exploits the work of done on behalf of syntax highlighting for spell checking. And, from the tests I've done so far, this allows for a significantly smaller number of false positives, at least for an out-of-the-box experience. So, personally, for the moment I'm first trying things out with jinx
and see what really needs to be ported to jinx
from these skipping rules I had for flyspell
. So far, my impression is that I'll be able to achieve similar results with much less setup.
Of course, using faces has the danger of possibly ignoring things that shouldn't be. And that's what this report/issue is about.
Besides, jinx
accepts custom predicates already. And if something is still missing in the end, one can always add it as a predicate.
Regarding custom predicates - it turned out that we cannot trust the flyspell-mode-predicate
to be efficient enough. I had already feared that this may be the case. Therefore Jinx is now completely on its own to handle exclusion. See #35. Generally I'd appreciate more feedback about the current quality of the exclusion based on faces. This particular issue is maybe a bit overly specific. For now I am more interested in a configuration which handles the 90% in various file types.
Hi @minad , I'm trying to tackle this one now and, it turns out, this face is particularly tricky. AUCTeX/font-latex.el
uses this face, and this face only, for all and every optional argument of macros with arguments, regardless of class or anything else. And it is hard-coded inside font-latex-match-command-with-arguments
with no viable way to configure this selectively. Given that most optional arguments are meant to be ignored by spell checking to start with, removing font-lock-variable-name-face
from jinx-exclude-faces
would mean that a huge load would have to be carried by some other means, presumably regexps. So this is really a tight spot, since not spell checking what should be is arguably very undesirable too.
And, given my last fumbled attempt, this time I went the other way around and tried to leverage the font-locking machinery for the purposes of fine tuning spell checking with jinx
. I think you'll like the approach, however I doubt this will be directly useful for jinx
, since you probably won't want to internalize limitations of font-locking of this or that major mode. Anyway, it is interesting at least as a proof of concept of how far it is possible to take the use of faces for the purpose.
gb/font-latex-match-biblatex-citation-lists
is dedicated to the task of biblatex
's citation lists, which are very peculiar. gb/font-latex-match-spellcheck-optargs
is more general, and can be used to inhibit font-lock-variable-name-face
at the face predicate for any regular macro, even if I'm currently applying it to citation commands only.
There you have it:
(with-eval-after-load 'font-latex
;; Fine tuning LaTeX font-lock for better spell checking with Jinx.
(add-hook 'LaTeX-mode-hook 'gb/jinx-latex-setup)
(defun gb/jinx-latex-setup ()
(setq-local font-lock-extra-managed-props
(append font-lock-extra-managed-props '(gb/jinx--inhibit-faces)))
(setq-local jinx--predicates
(list #'gb/jinx--latex-face-ignored-p
#'jinx--regexp-ignored-p
#'jinx--word-valid-p)))
(defun gb/jinx--latex-face-ignored-p (start)
"Return non-nil if face at START of word is ignored."
(let* ((face (get-text-property start 'face))
(inhibit (get-text-property start 'gb/jinx--inhibit-faces))
(jinx--exclude-faces (seq-difference jinx--exclude-faces inhibit)))
(or
(and jinx--include-faces
(if (listp face)
(cl-loop for f in face never (memq f jinx--include-faces))
(not (memq face jinx--include-faces))))
(and jinx--exclude-faces
(if (listp face)
(cl-loop for f in face thereis (memq f jinx--exclude-faces))
(memq face jinx--exclude-faces))))))
(defvar gb/font-latex-biblatex-citation-lists
'("cites" "Cites"
"parencites" "Parencites"
"footcites" "footcitetexts"
"smartcites" "Smartcites"
"textcites" "Textcites"
"supercites"
"autocites" "Autocites"
"volcites" "Volcites"
"pvolcites" "Pvolcites"
"fvolcites" "Fvolcites"
"svolcites" "Svolcites"
"tvolcites" "Tvolcites"
"avolcites" "Avolcites"))
;; Based on 'font-latex-match-command-with-arguments'.
(defun gb/font-latex-match-biblatex-citation-lists (limit)
(setq font-latex-matched-faces nil)
(let ((regexp (concat "\\\\\\(\\(?:"
(regexp-opt gb/font-latex-biblatex-citation-lists)
"\\)\\>\\)"))
(parse-sexp-ignore-comments t)
(arg-delims '((?\( . ?\)) (?\[ . ?\]) (?{ . ?})))
;; AUCTeX/font-latex.el handles the fontification of up to two
;; mandatory arguments of citation lists, and that's a "won't fix":
;; https://tex.stackexchange.com/a/426066
(handled-args 2) (curr-arg 1)
match-data)
(catch 'match
(while (re-search-forward regexp limit t)
(unless (font-latex-faces-present-p '(font-lock-comment-face
font-latex-verbatim-face)
(match-beginning 0))
(goto-char (match-end 0))
(catch 'break
(while (assq
(save-excursion
(while (and (not (eobp)) (font-latex-forward-comment)))
(char-after))
arg-delims)
(while (and (not (eobp)) (font-latex-forward-comment)))
(let* ((match-beg (point))
(char (char-after))
(closedp (font-latex-find-matching-close
char (cdr (assq char arg-delims)))))
(cond ((not closedp)
(throw 'break nil))
((eq char ?{)
(when (> curr-arg handled-args)
(push (1- (point)) match-data)
(push (1+ match-beg) match-data)
(push 'font-lock-constant-face font-latex-matched-faces))
(setq curr-arg (1+ curr-arg)))
((memq char '(?\( ?\[))
(push (1- (point)) match-data)
(push (1+ match-beg) match-data)
(if (> curr-arg handled-args)
(push '(face font-lock-variable-name-face
gb/jinx--inhibit-faces
(font-lock-variable-name-face))
font-latex-matched-faces)
(push '(face nil
gb/jinx--inhibit-faces
(font-lock-variable-name-face))
font-latex-matched-faces)))))))
(when match-data
(setq font-latex-matched-faces (nreverse font-latex-matched-faces))
(store-match-data (nreverse match-data))
(throw 'match t)))))))
(font-lock-add-keywords
'latex-mode
`((gb/font-latex-match-biblatex-citation-lists
.
,(let (faces)
;; 2 global notes, plus 12 full sets of 3
(dotimes (i 38)
(push `(,i (font-latex-matched-face ,i) append t) faces))
(nreverse faces)))))
;; Optional arguments will set the 'gb/jinx--inhibit-faces' property, unless
;; preceded by "!".
(defvar gb/font-latex-spellcheck-optargs
'(("cite" "*[[{") ("Cite" "[[{")
("parencite" "*[[{") ("Parencite" "[[{")
("footcite" "[[{") ("footcitetext" "[[{")
("textcite" "[[{") ("Textcite" "[[{")
("smartcite" "[[{") ("Smartcite" "[[{")
("supercite" "{")
("autocite" "*[[{") ("Autocite" "*[[{")
("citeauthor" "*[[{") ("Citeauthor" "*[[{")
("citetitle" "*[[{")
("citeyear" "*[[{") ("citedate" "*[[{")
("citeurl" "[[{")
("fullcite" "[[{") ("footfullcite" "[[{")
("volcite" "[{![{") ("Volcite" "[{![{")
("pvolcite" "[{![{") ("Pvolcite" "[{![{")
("fvolcite" "[{![{") ("ftvolcite" "[{![{")
("svolcite" "[{![{") ("Svolcite" "[{![{")
("tvolcite" "[{![{") ("Tvolcite" "[{![{")
("avolcite" "[{![{") ("Avolcite" "[{![{")
("notecite" "[[{") ("Notecite" "[[{")
("pnotecite" "[[{") ("Pnotecite" "[[{")
("fnotecite" "[[{")
("citename" "[[{![{") ("citelist" "[[{![{") ("citefield" "[[{![{")))
(defun gb/font-latex-match-spellcheck-optargs (limit)
(setq font-latex-matched-faces nil)
(let* ((macros (mapcar #'car gb/font-latex-spellcheck-optargs))
(regexp (concat "\\\\\\(\\(?:" (regexp-opt macros) "\\)\\>\\)")))
(catch 'match
(while (re-search-forward regexp limit t)
(unless (font-latex-faces-present-p '(font-lock-comment-face
font-latex-verbatim-face)
(match-beginning 0))
(let ((spec-list (string-to-list
(cadr (assoc (match-string 1)
gb/font-latex-spellcheck-optargs))))
(parse-sexp-ignore-comments t)
spec match-data match-beg ignore-p)
(goto-char (match-end 0))
(when (or (eq (car spec-list) ?*)
(eq (car spec-list) ?+))
(setq spec-list (cdr spec-list))
(skip-chars-forward "*+" (1+ (point))))
(catch 'break
(while spec-list
(setq spec (pop spec-list))
(when (eq spec ?!)
(setq ignore-p t)
(setq spec (pop spec-list)))
(while (and (not (eobp)) (font-latex-forward-comment)))
(cond ((eq spec ?{)
(unless (and (eq (char-after) spec)
(font-latex-find-matching-close ?{ ?}))
(throw 'break nil)))
((eq (char-after) spec)
(setq match-beg (point))
(if (font-latex-find-matching-close
spec (cdr (assq
spec
font-latex-command-with-args-opt-arg-delims)))
(unless ignore-p
(push (1- (point)) match-data)
(push (1+ match-beg) match-data)
(push '(face nil
gb/jinx--inhibit-faces
(font-lock-variable-name-face))
font-latex-matched-faces))
(throw 'break nil))))))
(when match-data
(setq font-latex-matched-faces (nreverse font-latex-matched-faces))
(store-match-data (nreverse match-data))
(throw 'match t))))))))
(font-lock-add-keywords
'latex-mode
`((gb/font-latex-match-spellcheck-optargs
.
,(let (faces)
(dotimes (i 5)
(push `(,i (font-latex-matched-face ,i) append t) faces))
(nreverse faces))))))
Just out of the oven, so lightly tested. But results seem pretty good.
My approach to such issues is usually to create a package. Why not create a package tex-enhanced-face-lock
which adds additional font locking. Alternatively propose this to AucTeX. Then you could configure Jinx in your configuration to take advantage of the enhanced font locking.
Let's close this one, since Jinx is unlikely to get deeper into these waters. A separate package seems like a good solution.
My approach to such issues is usually to create a package. Why not create a package
tex-enhanced-face-lock
which adds additional font locking. Alternatively propose this to AucTeX. Then you could configure Jinx in your configuration to take advantage of the enhanced font locking.
Well, for now it is a couple of functions in my init to handle use cases important to me. And I don't think this makes for a package for "TeX enhanced font lock", this is leveraging font-lock to assist Jinx in its task. Also I doubt it would make sense upstream. Indeed, for the case of gb/font-latex-match-biblatex-citation-lists
I already have a "won't fix" from Arash: https://tex.stackexchange.com/a/426066. But, granted, it does not fit well in Jinx either... As I said, just a proof of concept of how far the approach can go. :-)
EDIT: Btw, this approach is also very promising to add support for something like https://github.com/minad/jinx/issues/15, if one day you decide to go that route.
Having dwelt a whole day yesterday in font-latex.el
, I just wrote the following summary notes of my findings which may be helpful / useful:
;; The way font-lock works in AUCTeX, as done by 'font-latex.el', poses some
;; challenges to the face based predicate for ignoring (or not) certain
;; parts of the buffer used by Jinx.
;;
;; 1. All optional arguments are fontified with a single face,
;; 'font-lock-variable-name-face', for all classes and all commands, and
;; that's hard-coded inside 'font-latex-match-command-with-arguments' with
;; no viable way to selectively control it. Even if arguably most optional
;; arguments are meant to be ignored, not all of them are, and there is no
;; way to distinguish the cases based on faces alone.
;;
;; 2. For each keyword class, and hence for every command in the class, a
;; single face is applied to all mandatory arguments. Therefore, if a macro
;; has one argument which should be spell checked and another that should
;; not, there is no way to distinguish them based on faces alone.
;;
;; 3. Since the keyword classes are populated with criteria other than spell
;; checking in mind it is not always easy to decide if a class of commands
;; should have its arguments ignored or not. Take the "reference" class,
;; for example, which includes most cross-referencing commands, whose
;; arguments are labels, but also "\footnote", "\footnotetext" and
;; "\marginpar". Besides, some keyword classes share the same face.
;;
;; 4. Given that (currently) the face predicate is called first by Jinx, at
;; the time the regexp predicate is called, point is already inside the
;; argument, meaning a regexp for the whole macro won't match, and which
;; macro that argument comes from can only be found out by expensive
;; backwards searching.
;;
;; To some extend, number 3 can be dealt with by tweaking font-lock with
;; regular tools ('font-latex-add-keywords' etc.), possibly requiring
;; personal style files in many cases. How much this is a meaningful way to
;; deal with the problem, I'm not sure.
;;
;; A reordering of the ignore predicates may help with 4, but it doesn't
;; fully solve the problem either. Because it is easy to use a regexp to
;; ignore a command which the faces predicate would not, but not the other
;; way around. So, if you have a command using a face which is part of
;; 'jinx-exclude-faces' but should be spell checked, the only way to deal
;; with it would be to remove the face altogether from 'jinx-exclude-faces',
;; and then handling all the other commands which use that face be ignored
;; by regexps. Not an appealing route.
;;
;; But numbers 1 and 2 remain impossible to tackle with current machinery
;; since there's no way to distinguish the parts of a regexp what should be
;; spell checked from those that shouldn't.
Also, imho, even if you have deemed my attempts to handle these either misguided or too complicated, it doesn't make the problem go away.
You may say that:
This particular issue is maybe a bit overly specific.
I beg to disagree. The problem is deep seated in the font-latex.el
machinery. Unless, of course, you meant "LaTeX" by that. But even if Jinx sees LaTeX as a "corner case", which I'm not sure it is, that still depends on how important that case is for individual users.
Also, imho, even if you have deemed my attempts to handle these either misguided or too complicated, it doesn't make the problem go away.
Of course the problem goes away for me by deeming something out of scope of Jinx. :)
I think handling these TeX arguments is an edge case and we cannot solve this here if the existing TeX machinery is lacking. At least the scenario here doesn't introduce false positives.
Of course the problem goes away for me by deeming something out of scope of Jinx. :)
I think handling these TeX arguments is an edge case and we cannot solve this here if the existing TeX machinery is lacking.
Ok, understood.
At least the scenario here doesn't introduce false positives.
It does though. This issue was opened demonstrating the existence of a whole class of them.
It does though. This issue was opened demonstrating the existence of a whole class of them.
You mean false negatives or did I misunderstand?
You mean false negatives or did I misunderstand?
Hard to know what is type I and type II in this context. I meant words that should be flagged as misspellings and which are not.
Yes, these are false negatives. This issue is about false negatives in the default configuration. If you remove the face in your configuration you will get a lot of false positives.
Yes, these are false negatives.
Assuming the test is "is the word a misspelling?", yes.
I found at least one case where the
font-lock-variable-name-face
is used and where arbitrary text is to be expected and, hence, should be spell checked: the optional arguments of citations.Consider the following document:
I'm using AUCTeX. Tested with
emacs -Q
and the following setup:And the result: