Exclude link hyperlinks with '#' anchors from ispell spellchecking

jrblevin / markdown-mode

Emacs Markdown Mode

http://jblevins.org/projects/markdown-mode/

GNU General Public License v3.0

898 stars 164 forks source link

Exclude link hyperlinks with '#' anchors from ispell spellchecking #420

Open gwern opened 5 years ago

gwern commented 5 years ago

On my system (Ubuntu 18.04.3 LTS / Emacs 25.2.2 / elpa-markdown-mode 2.3+154-1), something has been annoying me for a long time: using M-x ispell frequently attempts to spellcheck hyperlinks (particularly internal links to sections or ones to mega.nz). I finally realized while spellchecking my GPT-2 page just now which uses both of those kinds of links heavily, that the problem is that each of these URLs has a section/fragment using #: eg https://mega.nz/#!HXhRwS7R!yl4qZM-gMWdn4Qc3scavOBKqdLNAcZ_WYd2gVPqabPg or an internal link like [345M](#gpt-2-345m).

Looking at a simple sample like

xwd-157342101989170

ispell correctly ignores most of the URL, but then flags everything after the #. I assume a regexp is going wrong somewhere?

syohex commented 4 years ago

I suppose this is not markdown-mode issue. You need to set ispell-skip-region-alist variable as below

(defun my/markdown-mode-hook ()
  (add-to-list 'ispell-skip-region-alist '("#[a-zA-Z]+" forward-word)))

(add-hook 'markdown-mode-hook #'my/markdown-mode-hook)

gwern commented 4 years ago

That does seem to help.

But it is a markdown-mode issue rather than an ispell issue because the # fragments are valid Markdown links, and spellcheckers like ispell or Flyspell cannot be expected to know what is valid syntax for arbitrary text types and what is a spelling error and that must be provided by the modes, which is why markdown-mode already encodes knowledge for flyspell, eg https://github.com/jrblevin/markdown-mode/blob/master/markdown-mode.el#L2350 . My point is that this overriding appears to be incomplete since URL parameter text after anchors is still being fed to spellcheckers.

(I don't know Emacs major modes or markdown-mode well enough to really venture any suggestions about how to fix this beyond adding a hook to special-case the # fragment situation, but I notice that markdown-flyspell-check-word-p doesn't seem to handle URLs or use link-related predicates like markdown-link-at-pos so I dunno what's going on there.)

syohex commented 4 years ago

IMHO I suppose markdown-mode should not define flyspell/ispell configuration like markdown-flyspell-check-word-p, because we may typo in URL, code block, comment. Almost all of other major-modes do not set flyspell/ispell configuration. If someone uses words which is not listed in dictionary and want to avoid spellchecker warnings, I think they should set their own ispell/flyspell configuration or create their own ignore list by each individual.

gwern commented 4 years ago

because we may typo in URL, code block, comment.

??? Those should be excluded, as they are, because it is impossible even in principle to 'spellcheck' those. How exactly would any mode, ever, detect a typo in a URL or code block, given that code can be arbitrarily complex and refer to libraries or code elsewhere or define new functions and operators at runtime, even, and URLs can be anything, and don't even have to refer to a valid domain name (not that anyone would expect their spellchecker to try to ping URLs to check that they resolve or with what error code...). And the functionality seems widely used, that's how things like flyspell-prog-mode work.