rojopolis / spellcheck-github-actions

Spell check action
MIT License
138 stars 38 forks source link

Ignore spell check in hyperlinks #24

Closed MarekLani closed 4 years ago

MarekLani commented 4 years ago

Please is there a way how to ignore spell check of hyperlinks? link text

jonasbn commented 4 years ago

Meaning you want to check the spelling of the "link text", but not the actual URL?

jonasbn commented 4 years ago

Hi @MarekLani

I did a test, running pyspelling manually on a HTML file, this does not report any spelling errors for the HTML part, only the text part.

$ clear; pyspelling --config spellcheck.yaml
Misspelled words:
<htmlcontent> index.html: html>body
--------------------------------------------------------------------------------
baader
evin
speeling
--------------------------------------------------------------------------------

!!!Spelling check failed!!!

HTML file:

<a href="/baad_speling/">evin baader speeling</a>

Using the configuration outlined here:

matrix:
- name: HTML
  aspell:
    lang: en
  dictionary:
    encoding: utf-8
  pipeline:
  - pyspelling.filters.html:
    comments: false
  sources:
  - '**/*.html'
  default_encoding: utf-8

Could you please provide me with a copy of you configuration file?

MarekLani commented 4 years ago

@jonasbn thank you for response and sorry I should have stated, that this is present in md check. I meant following: [Link text to be checked](link_address_should_not_be_checked)

This is my config:

matrix:
- name: Markdown
  aspell:
    lang: en
  dictionary:
    wordlists:
    - wordlist.txt
    encoding: utf-8
  pipeline:
    - pyspelling.filters.markdown:
      markdown_extensions:
      - markdown.extensions.extra:
    - pyspelling.filters.html:
        comments: true
        attributes:
        - title
        - alt
        ignores:
        - ':matches(code, pre)'
        - 'code'
        - 'pre'
  sources:
  - '**/*.md'
  default_encoding: utf-8
jonasbn commented 4 years ago

Hi @MarekLani

I have tested against this example file using your provided config:

[evin baader speeling](/baad_speling/)
$ pyspelling --config spellcheck2.yaml
Misspelled words:
<htmlcontent> index.md: html>body>p
--------------------------------------------------------------------------------
baader
evin
speeling
--------------------------------------------------------------------------------

!!!Spelling check failed!!!

The link text is checked, not the URL part.

Could you possibly provide me with more data on what you observe, since I cannot reproduce what you request.

Do note that pyspelling converts Markdown to HTML before doing the check, hence the HTML output pointing to the DOM.

From the example above

index.md: html>body>p
jonasbn commented 4 years ago

Hi @MarekLani

I have responded to your question and I have demonstrated use of the software and it's expected behaviour, so I am closing this issue.

rishitc commented 3 years ago

I'm not sure if the issue still exists for the original author, but I ran into this very issue some time ago and the configuration file discussed here (quoted below) where the HTML filter (with some configurations) is used after the markdown filter, completely solved the issue for me :+1:

@jonasbn thank you for response and sorry I should have stated, that this is present in md check. I meant following: [Link text to be checked](link_address_should_not_be_checked)

This is my config:

matrix:
- name: Markdown
  aspell:
    lang: en
  dictionary:
    wordlists:
    - wordlist.txt
    encoding: utf-8
  pipeline:
    - pyspelling.filters.markdown:
      markdown_extensions:
      - markdown.extensions.extra:
    - pyspelling.filters.html:
        comments: true
        attributes:
        - title
        - alt
        ignores:
        - ':matches(code, pre)'
        - 'code'
        - 'pre'
  sources:
  - '**/*.md'
  default_encoding: utf-8
facelessuser commented 3 years ago

Yes, to avoid <a> tags, you would use the HTML filter. The Markdown filter is just used to convert Markdown to HTML. If you need to avoid URL in plain text, the URL filter can help with that.