Closed supernovae closed 3 years ago
Are you spell checking raw Markdown or are you converting it to HTML and then checking the HTML?
Often, I convert Markdown to HTML and then use CSS selectors to ignore code blocks and such. It is often easier for me to filter things in HTML. That is my general recommendation, but there are potentially other ways. PySpelling, which is used to filter the content, has a Markdown filter which basically converts the Markdown content to HTML. The HTML filter allows you to filter out tags with selectors. Granted, you must enable an extension to handle fenced code properly though as fenced code is not part of the spec (old school spec, I'm not sure about CommonMark).
The shipped extension that handles Markdown uses Python Markdown, which is an old school Markdown parser (not a CommonMark parser). For me, that is more than sufficient for my needs as the documentation I am often parsing also uses Python Markdown when I publish the documentation. If CommonMark is a requirement, a 3rd party extension can surely be created .
I can't really answer further without out knowing how you are attempting to spell check your Markdown.
Thanks for the quick reply!
I'm just diving into this extension and trying to get it to work - my repo is here: https://github.com/supernovae/documentation and the config is here: https://github.com/supernovae/documentation/blob/main/.github/config/.spellcheck.yml
I noticed when the action is run, all of the code looks like its in
@supernovae Okay, so a couple of things. I checked out the repo you pointed me at and attempted to run PySpelling. Now, just as an FYI, while I do follow this repo because it relies on PySpelling, I do not actually use this action. As I am very familiar with Python, I set up my own action directly using Python in my CI environments, I also am used to testing it locally as I am also the author of Pyspelling. So I do not use this action and the docker image wrapper it relies on. Not only that, I am the author of PySpelling, and am comfortable running it directly on my local machine to test and debug, which is what you are going to see below.
The config doesn't quite seem right. You can see the error below. My immediate thought is that if the action is running this and not throwing an error, maybe something is occurring in the action that suppresses this? Maybe the action is somehow preprocessing the config, losing the malformed content, and passing it to pyspelling. In this case, the malformed content is the ignore options and such.
➜ documentation git:(main) python3 -m pyspelling --config .github/config/.spellcheck.yml
Traceback (most recent call last):
File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 83, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 26, in main
return run(
File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 51, in run
for results in spellcheck(
File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 673, in spellcheck
for result in spellchecker.run_task(task, source_patterns=sources):
File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 311, in run_task
self._build_pipeline(task)
File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 255, in _build_pipeline
raise ValueError(STEP_ERROR.format(str(step)))
ValueError: Pipline step in unexpected format: {'pyspelling.filters.html': None, 'comments': False, 'ignores': ['code', 'pre']}
So I fixed it by adding the appropriate indentation to the HTML filter's options:
matrix:
- name: Markdown
aspell:
ignore-case: true
lang: en
dictionary:
wordlists:
- .github/config/.wordlist.txt
encoding: utf-8
pipeline:
- pyspelling.filters.markdown:
- pyspelling.filters.html:
comments: false
ignores:
- code
- pre
sources:
- '**/*.md'
default_encoding: utf-8
Now I got a proper run, and I see no misspelling in code
blocks:
➜ documentation git:(main) ✗ python3 -m pyspelling --config .github/config/.spellcheck.yml
Misspelled words:
<htmlcontent> docs/misc/references.md: html>body>ul>li
--------------------------------------------------------------------------------
Responsiblity
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/private-cluster.md: html>body>ol>li>p
--------------------------------------------------------------------------------
AzureFirewallSubnet
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/private-cluster.md: html>body>ol>li>p
--------------------------------------------------------------------------------
VirtualAppliance
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/private-cluster.md: html>body>ol>li>p
--------------------------------------------------------------------------------
apiserverProfile
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/private-cluster.md: html>body>h2
--------------------------------------------------------------------------------
Adendum
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/egress-ipam-operator.md: html>body>ol>li>p
--------------------------------------------------------------------------------
kubeadmin
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>h1
--------------------------------------------------------------------------------
Diasaster
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>ol>li
--------------------------------------------------------------------------------
failback
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>ol>li
--------------------------------------------------------------------------------
Avoidence
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
--------------------------------------------------------------------------------
Availabily
focussed
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
--------------------------------------------------------------------------------
resiliance
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
--------------------------------------------------------------------------------
excercise
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
--------------------------------------------------------------------------------
eachother's
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Prequsites
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Preperation
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/federated-metrics/README.md: html>body>blockquote>p
--------------------------------------------------------------------------------
indepth
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Queryier
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/federated-metrics/README.md: html>body>ol>li>p
--------------------------------------------------------------------------------
datasource
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/federated-metrics/user-defined.md: html>body>blockquote>p
--------------------------------------------------------------------------------
indepth
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/federated-metrics/user-defined.md: html>body>ol>li>p
--------------------------------------------------------------------------------
datasource
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/astronomer/README-public.md: html>body>ol>li>p
--------------------------------------------------------------------------------
HNuZ
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/astronomer/README-public.md: html>body>h2
--------------------------------------------------------------------------------
SCCs
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/astronomer/README-public.md: html>body>ol>li>p
--------------------------------------------------------------------------------
securityContext
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/astronomer/README.md: html>body>ol>li>p
--------------------------------------------------------------------------------
dropdown
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/astronomer/README.md: html>body>ol>li>p
--------------------------------------------------------------------------------
HNuZ
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/astronomer/README.md: html>body>h2
--------------------------------------------------------------------------------
SCCs
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/astronomer/README.md: html>body>p
--------------------------------------------------------------------------------
securityContext
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aro/astronomer/README.md: html>body>ol>li>p
--------------------------------------------------------------------------------
hacky
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/quickstart-rosa.md: html>body>h2
--------------------------------------------------------------------------------
Walkthrough
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/o11y/az-log-analytics.md: html>body>h1
--------------------------------------------------------------------------------
Analytics
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/o11y/az-log-analytics.md: html>body>p
--------------------------------------------------------------------------------
analytics
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/o11y/openshift-logging.md: html>body>ol>li>p
--------------------------------------------------------------------------------
recieving
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/quickstart-aro.md: html>body>h2
--------------------------------------------------------------------------------
Walkthrough
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/quickstart-aro.md: html>body>h2
--------------------------------------------------------------------------------
Adendum
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
--------------------------------------------------------------------------------
CNAME's
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
--------------------------------------------------------------------------------
enviroment
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
--------------------------------------------------------------------------------
homev
wafv
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
--------------------------------------------------------------------------------
importwizard
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>ul>li
--------------------------------------------------------------------------------
referer
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/alb.md: html>body>blockquote>p
--------------------------------------------------------------------------------
premiumsupport
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/alb.md: html>body>ol>li>p
--------------------------------------------------------------------------------
sigs
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/alb.md: html>body>ol>li>p
--------------------------------------------------------------------------------
homev
wafv
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/alb.md: html>body>ol>li>p
--------------------------------------------------------------------------------
uncomment
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/README-complex.md: html>body>p
--------------------------------------------------------------------------------
CTONET
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/aws/waf/README-complex.md: html>body>blockquote>p
--------------------------------------------------------------------------------
premiumsupport
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/demos/gitops/README.md: html>body>h2
--------------------------------------------------------------------------------
Walkthrough
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/demos/gitops/README.md: html>body>ul>li
--------------------------------------------------------------------------------
Kustomize
kam
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Prequsites
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Preperation
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/README.md: html>body>blockquote>p
--------------------------------------------------------------------------------
indepth
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Queryier
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/README.md: html>body>ol>li>p
--------------------------------------------------------------------------------
datasource
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/user-defined.md: html>body>blockquote>p
--------------------------------------------------------------------------------
indepth
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/user-defined.md: html>body>ol>li>p
--------------------------------------------------------------------------------
datasource
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/rosa/sts-with-private-link/README.md: html>body>p
--------------------------------------------------------------------------------
plublic
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/rosa/sts-with-private-link/README.md: html>body>blockquote>p
--------------------------------------------------------------------------------
ARNs
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/rosa/sts/README.md: html>body>blockquote>p
--------------------------------------------------------------------------------
ARNs
--------------------------------------------------------------------------------
Misspelled words:
<htmlcontent> docs/ocp/common-images-namespace/README.md: html>body>p
--------------------------------------------------------------------------------
combersome
--------------------------------------------------------------------------------
!!!Spelling check failed!!!
So, that is how to fix your issue with PySpelling. Why is the action not straight-up failing with the malformed YAML? That I do not know and maybe something that this action repo may need to look into.
I assume that fixing the config will fix your issue, but I have not run your repo through the action in this repo.
Let me rephrase, I'm not seeing any misspellings in HTML code blocks. I see that you have not enabled any fenced code extensions, so give me a minute.
It must be converting the fenced code block into inline code blocks as you have no fenced extension enabled. But since inline code is ignored as well, it works just fine, so 🤷🏻 .
I fixed the YAML formatting and 0'd out the wordlist and now I'm back up to 6000+ lines of output on spell-check. I had a fairly large wordlist in my branch I shared, if you 0 out the .github/config/.wordlist.txt you may replicate the output.
The majority of the words failing are in a markdown ```code
block
@supernovae Okay, there are a couple of things here. I zeroed out the wordlist, just as you said you were doing. I found no actual actual pre
block getting spell checked.
Now, I do not think I am running what you are running as your results mentioned docs/aro/clf-to-zure/README.md
which is not found on the master branch of the repo you pointed me at, so I cannot test exactly what you are. This also means, I'm not confident the config I'm debugging is even the same as yours. So I cannot debug further without knowing I'm testing exactly what you are.
I got no misspellings in pre
blocks until I enabled pymdownx.superfences
. This makes sense as I imagine Python Markdown was parsing all of your fenced code blocks as inline code instead of fenced blocks. Once I did, I did see some, but the content was not actually in the code blocks even though the context reported it as such.
<li>
<p>Use <code>kubectl</code> to apply the <code>bgd-app.yaml</code> file
<div class="highlight"><pre><span></span><code>kubectl apply -f documentation/modules/ROOT/examples/bgd-app/bgd-app.yaml
</code></pre></div>
>The bgd-app.yaml file defines several things, including the repo location for the <code>gitops-bgd-app</code> application<br>
<img alt="screenshot of bgd-app-yaml" src="./bgd-app-yaml.png" /></p>
</li>
results
b'>The bgd-app.yaml file defines several things, including the repo location for the application '
Context: docs/demos/gitops/README.md: html>body>ol>li>div>pre
Misspelled words:
<htmlcontent> docs/demos/gitops/README.md: html>body>ol>li>div>pre
--------------------------------------------------------------------------------
bgd
repo
yaml
--------------------------------------------------------------------------------
So the context reporting had a bug, which I fixed locally.
b'>The bgd-app.yaml file defines several things, including the repo location for the application '
Context: docs/demos/gitops/README.md: html>body>ol>li
Misspelled words:
<htmlcontent> docs/demos/gitops/README.md: html>body>ol>li
--------------------------------------------------------------------------------
bgd
repo
yaml
--------------------------------------------------------------------------------
In short, I am not seeing it incorrectly parsing content within code blocks, though I did see some false reporting which I will have fixed in the next PySpelling. If yours is truly doing showing words in code blocks after specify that PySpelling should ignore them, there is something off in your config, but I cannot verify this as I don't think I'm even testing the same branch you are.
Additonally, while I think Python Markdown may parse your Markdown "good enough" to spell check (after enabling a fenced code extension), I do see that it doesn't quite parse everything exactly right as it has some expecations regarding formatting that some parsers (like CommonMark parsers) do not. I won't comment further on this as it appers to do well enough to avoid code blocks and such which appears to be the main concern.
Based on your results vs mine, I still think you have a config issue. I see <!--comment-->
being targeted which makes no sense if you turned off comments. And comments appearing has nothing to do with the eariler mentioned bug. I think your config is still not right, but I cannot confirm as I don't even know what branch you are actually testing.
Reporting bug has been fixed and deployed
So, it's been about a week. I figure you've solved this as I haven't seen any new info to help debug this further. I know this action is still using an older PySpelling, so once updated, it shouldn't see the wrong context for HTML elements.
This is planned to be included in the upcoming 0.15.0 release.
Thanks for the updates! Sorry, been busy with work. I look forward to trying this out and again, I really appreciate your help here!
@byronmiller No worries. I was just following up to make sure there are no known bugs in PySpelling. These issues stay fresh in my mind for maybe a week (especially when they are filed on 3rd party repositories), then I'll forget about them unless I am pinged again 🙃. So, take your time.
Hopefully, when you get back to this, the issue turns out to just be a config use issue. I've fixed the only issue I could find in PySpelling, and it was mainly cosmetic.
Version 0.15.0 has just been uploaded to the marketplace
Hi @supernovae
I experienced a weird issue with release 0.15.0, so I mad a hotfix to patch the issue, so you should have a look at release 0.16.0 instead,
Is it possible to tell spellcheck to ignore the markdown hinted areas for language?
For example, if I have
```bash kubectl get nodes ```
Can it just not spellcheck that bash statement? I have lots of awscli or kubectl or curl commands and right now that means 800+ misspelled words