rojopolis / spellcheck-github-actions

Spell check action
MIT License
138 stars 38 forks source link

Ignore markdown regions with code hints #53

Closed supernovae closed 3 years ago

supernovae commented 3 years ago

Is it possible to tell spellcheck to ignore the markdown hinted areas for language?

For example, if I have

```bash kubectl get nodes ```

Can it just not spellcheck that bash statement? I have lots of awscli or kubectl or curl commands and right now that means 800+ misspelled words

facelessuser commented 3 years ago

Are you spell checking raw Markdown or are you converting it to HTML and then checking the HTML?

Often, I convert Markdown to HTML and then use CSS selectors to ignore code blocks and such. It is often easier for me to filter things in HTML. That is my general recommendation, but there are potentially other ways. PySpelling, which is used to filter the content, has a Markdown filter which basically converts the Markdown content to HTML. The HTML filter allows you to filter out tags with selectors. Granted, you must enable an extension to handle fenced code properly though as fenced code is not part of the spec (old school spec, I'm not sure about CommonMark).

The shipped extension that handles Markdown uses Python Markdown, which is an old school Markdown parser (not a CommonMark parser). For me, that is more than sufficient for my needs as the documentation I am often parsing also uses Python Markdown when I publish the documentation. If CommonMark is a requirement, a 3rd party extension can surely be created .

I can't really answer further without out knowing how you are attempting to spell check your Markdown.

supernovae commented 3 years ago

Thanks for the quick reply!

I'm just diving into this extension and trying to get it to work - my repo is here: https://github.com/supernovae/documentation and the config is here: https://github.com/supernovae/documentation/blob/main/.github/config/.spellcheck.yml

I noticed when the action is run, all of the code looks like its in

  • HTML - poking around to see what I can do... basically just want to ignore the bash code in markdown, but obviously its no longer markdown when It checks the html

  • facelessuser commented 3 years ago

    @supernovae Okay, so a couple of things. I checked out the repo you pointed me at and attempted to run PySpelling. Now, just as an FYI, while I do follow this repo because it relies on PySpelling, I do not actually use this action. As I am very familiar with Python, I set up my own action directly using Python in my CI environments, I also am used to testing it locally as I am also the author of Pyspelling. So I do not use this action and the docker image wrapper it relies on. Not only that, I am the author of PySpelling, and am comfortable running it directly on my local machine to test and debug, which is what you are going to see below.

    1. The config doesn't quite seem right. You can see the error below. My immediate thought is that if the action is running this and not throwing an error, maybe something is occurring in the action that suppresses this? Maybe the action is somehow preprocessing the config, losing the malformed content, and passing it to pyspelling. In this case, the malformed content is the ignore options and such.

      ➜  documentation git:(main) python3 -m pyspelling --config .github/config/.spellcheck.yml
      Traceback (most recent call last):
        File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
          return _run_code(code, main_globals, None,
        File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
          exec(code, run_globals)
        File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 83, in <module>
          sys.exit(main())
        File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 26, in main
          return run(
        File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 51, in run
          for results in spellcheck(
        File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 673, in spellcheck
          for result in spellchecker.run_task(task, source_patterns=sources):
        File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 311, in run_task
          self._build_pipeline(task)
        File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 255, in _build_pipeline
          raise ValueError(STEP_ERROR.format(str(step)))
      ValueError: Pipline step in unexpected format: {'pyspelling.filters.html': None, 'comments': False, 'ignores': ['code', 'pre']}

      So I fixed it by adding the appropriate indentation to the HTML filter's options:

      matrix:
      - name: Markdown
        aspell:
          ignore-case: true
          lang: en
        dictionary:
          wordlists:
          - .github/config/.wordlist.txt
          encoding: utf-8
        pipeline:
        - pyspelling.filters.markdown:
        - pyspelling.filters.html:
            comments: false
            ignores:
            - code
            - pre
        sources:
        - '**/*.md'
        default_encoding: utf-8
    2. Now I got a proper run, and I see no misspelling in code blocks:

      ➜  documentation git:(main) ✗ python3 -m pyspelling --config .github/config/.spellcheck.yml
      Misspelled words:
      <htmlcontent> docs/misc/references.md: html>body>ul>li
      --------------------------------------------------------------------------------
      Responsiblity
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/private-cluster.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      AzureFirewallSubnet
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/private-cluster.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      VirtualAppliance
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/private-cluster.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      apiserverProfile
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/private-cluster.md: html>body>h2
      --------------------------------------------------------------------------------
      Adendum
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/egress-ipam-operator.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      kubeadmin
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/disaster-recovery/README.md: html>body>h1
      --------------------------------------------------------------------------------
      Diasaster
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/disaster-recovery/README.md: html>body>ol>li
      --------------------------------------------------------------------------------
      failback
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/disaster-recovery/README.md: html>body>ol>li
      --------------------------------------------------------------------------------
      Avoidence
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
      --------------------------------------------------------------------------------
      Availabily
      focussed
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
      --------------------------------------------------------------------------------
      resiliance
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
      --------------------------------------------------------------------------------
      excercise
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
      --------------------------------------------------------------------------------
      eachother's
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/federated-metrics/README.md: html>body>h2
      --------------------------------------------------------------------------------
      Prequsites
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/federated-metrics/README.md: html>body>h2
      --------------------------------------------------------------------------------
      Preperation
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/federated-metrics/README.md: html>body>blockquote>p
      --------------------------------------------------------------------------------
      indepth
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/federated-metrics/README.md: html>body>h2
      --------------------------------------------------------------------------------
      Queryier
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/federated-metrics/README.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      datasource
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/federated-metrics/user-defined.md: html>body>blockquote>p
      --------------------------------------------------------------------------------
      indepth
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/federated-metrics/user-defined.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      datasource
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/astronomer/README-public.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      HNuZ
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/astronomer/README-public.md: html>body>h2
      --------------------------------------------------------------------------------
      SCCs
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/astronomer/README-public.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      securityContext
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/astronomer/README.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      dropdown
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/astronomer/README.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      HNuZ
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/astronomer/README.md: html>body>h2
      --------------------------------------------------------------------------------
      SCCs
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/astronomer/README.md: html>body>p
      --------------------------------------------------------------------------------
      securityContext
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aro/astronomer/README.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      hacky
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/quickstart-rosa.md: html>body>h2
      --------------------------------------------------------------------------------
      Walkthrough
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/o11y/az-log-analytics.md: html>body>h1
      --------------------------------------------------------------------------------
      Analytics
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/o11y/az-log-analytics.md: html>body>p
      --------------------------------------------------------------------------------
      analytics
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/o11y/openshift-logging.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      recieving
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/quickstart-aro.md: html>body>h2
      --------------------------------------------------------------------------------
      Walkthrough
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/quickstart-aro.md: html>body>h2
      --------------------------------------------------------------------------------
      Adendum
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      CNAME's
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      enviroment
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      homev
      wafv
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      importwizard
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>ul>li
      --------------------------------------------------------------------------------
      referer
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/alb.md: html>body>blockquote>p
      --------------------------------------------------------------------------------
      premiumsupport
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/alb.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      sigs
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/alb.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      homev
      wafv
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/alb.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      uncomment
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/README-complex.md: html>body>p
      --------------------------------------------------------------------------------
      CTONET
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/aws/waf/README-complex.md: html>body>blockquote>p
      --------------------------------------------------------------------------------
      premiumsupport
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/demos/gitops/README.md: html>body>h2
      --------------------------------------------------------------------------------
      Walkthrough
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/demos/gitops/README.md: html>body>ul>li
      --------------------------------------------------------------------------------
      Kustomize
      kam
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/rosa/federated-metrics/README.md: html>body>h2
      --------------------------------------------------------------------------------
      Prequsites
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/rosa/federated-metrics/README.md: html>body>h2
      --------------------------------------------------------------------------------
      Preperation
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/rosa/federated-metrics/README.md: html>body>blockquote>p
      --------------------------------------------------------------------------------
      indepth
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/rosa/federated-metrics/README.md: html>body>h2
      --------------------------------------------------------------------------------
      Queryier
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/rosa/federated-metrics/README.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      datasource
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/rosa/federated-metrics/user-defined.md: html>body>blockquote>p
      --------------------------------------------------------------------------------
      indepth
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/rosa/federated-metrics/user-defined.md: html>body>ol>li>p
      --------------------------------------------------------------------------------
      datasource
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/rosa/sts-with-private-link/README.md: html>body>p
      --------------------------------------------------------------------------------
      plublic
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/rosa/sts-with-private-link/README.md: html>body>blockquote>p
      --------------------------------------------------------------------------------
      ARNs
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/rosa/sts/README.md: html>body>blockquote>p
      --------------------------------------------------------------------------------
      ARNs
      --------------------------------------------------------------------------------
      
      Misspelled words:
      <htmlcontent> docs/ocp/common-images-namespace/README.md: html>body>p
      --------------------------------------------------------------------------------
      combersome
      --------------------------------------------------------------------------------
      
      !!!Spelling check failed!!!
    facelessuser commented 3 years ago

    So, that is how to fix your issue with PySpelling. Why is the action not straight-up failing with the malformed YAML? That I do not know and maybe something that this action repo may need to look into.

    I assume that fixing the config will fix your issue, but I have not run your repo through the action in this repo.

    facelessuser commented 3 years ago

    Let me rephrase, I'm not seeing any misspellings in HTML code blocks. I see that you have not enabled any fenced code extensions, so give me a minute.

    facelessuser commented 3 years ago

    It must be converting the fenced code block into inline code blocks as you have no fenced extension enabled. But since inline code is ignored as well, it works just fine, so 🤷🏻 .

    supernovae commented 3 years ago

    I fixed the YAML formatting and 0'd out the wordlist and now I'm back up to 6000+ lines of output on spell-check. I had a fairly large wordlist in my branch I shared, if you 0 out the .github/config/.wordlist.txt you may replicate the output.

    image

    supernovae commented 3 years ago

    The majority of the words failing are in a markdown ```code block

    facelessuser commented 3 years ago

    @supernovae Okay, there are a couple of things here. I zeroed out the wordlist, just as you said you were doing. I found no actual actual pre block getting spell checked.

    Now, I do not think I am running what you are running as your results mentioned docs/aro/clf-to-zure/README.md which is not found on the master branch of the repo you pointed me at, so I cannot test exactly what you are. This also means, I'm not confident the config I'm debugging is even the same as yours. So I cannot debug further without knowing I'm testing exactly what you are.

    I got no misspellings in pre blocks until I enabled pymdownx.superfences. This makes sense as I imagine Python Markdown was parsing all of your fenced code blocks as inline code instead of fenced blocks. Once I did, I did see some, but the content was not actually in the code blocks even though the context reported it as such.

    <li>
    <p>Use <code>kubectl</code> to apply the <code>bgd-app.yaml</code> file
        <div class="highlight"><pre><span></span><code>kubectl apply -f documentation/modules/ROOT/examples/bgd-app/bgd-app.yaml
    </code></pre></div>
        &gt;The bgd-app.yaml file defines several things, including the repo location for the <code>gitops-bgd-app</code> application<br>
        <img alt="screenshot of bgd-app-yaml" src="./bgd-app-yaml.png" /></p>
    </li>

    results

    b'>The bgd-app.yaml file defines several things, including the repo location for the application '
    Context: docs/demos/gitops/README.md: html>body>ol>li>div>pre
    Misspelled words:
    <htmlcontent> docs/demos/gitops/README.md: html>body>ol>li>div>pre
    --------------------------------------------------------------------------------
    bgd
    repo
    yaml
    --------------------------------------------------------------------------------

    So the context reporting had a bug, which I fixed locally.

    b'>The bgd-app.yaml file defines several things, including the repo location for the application '
    Context: docs/demos/gitops/README.md: html>body>ol>li
    Misspelled words:
    <htmlcontent> docs/demos/gitops/README.md: html>body>ol>li
    --------------------------------------------------------------------------------
    bgd
    repo
    yaml
    --------------------------------------------------------------------------------

    In short, I am not seeing it incorrectly parsing content within code blocks, though I did see some false reporting which I will have fixed in the next PySpelling. If yours is truly doing showing words in code blocks after specify that PySpelling should ignore them, there is something off in your config, but I cannot verify this as I don't think I'm even testing the same branch you are.

    Additonally, while I think Python Markdown may parse your Markdown "good enough" to spell check (after enabling a fenced code extension), I do see that it doesn't quite parse everything exactly right as it has some expecations regarding formatting that some parsers (like CommonMark parsers) do not. I won't comment further on this as it appers to do well enough to avoid code blocks and such which appears to be the main concern.

    Based on your results vs mine, I still think you have a config issue. I see <!--comment--> being targeted which makes no sense if you turned off comments. And comments appearing has nothing to do with the eariler mentioned bug. I think your config is still not right, but I cannot confirm as I don't even know what branch you are actually testing.

    facelessuser commented 3 years ago

    Reporting bug has been fixed and deployed

    facelessuser commented 3 years ago

    So, it's been about a week. I figure you've solved this as I haven't seen any new info to help debug this further. I know this action is still using an older PySpelling, so once updated, it shouldn't see the wrong context for HTML elements.

    jonasbn commented 3 years ago

    This is planned to be included in the upcoming 0.15.0 release.

    byronmiller commented 3 years ago

    Thanks for the updates! Sorry, been busy with work. I look forward to trying this out and again, I really appreciate your help here!

    facelessuser commented 3 years ago

    @byronmiller No worries. I was just following up to make sure there are no known bugs in PySpelling. These issues stay fresh in my mind for maybe a week (especially when they are filed on 3rd party repositories), then I'll forget about them unless I am pinged again 🙃. So, take your time.

    Hopefully, when you get back to this, the issue turns out to just be a config use issue. I've fixed the only issue I could find in PySpelling, and it was mainly cosmetic.

    jonasbn commented 3 years ago

    Version 0.15.0 has just been uploaded to the marketplace

    jonasbn commented 3 years ago

    Hi @supernovae

    I experienced a weird issue with release 0.15.0, so I mad a hotfix to patch the issue, so you should have a look at release 0.16.0 instead,