Simple tools to try support spell checking of Jupyter notebooks.
The nb_spellchecker
is a package to explore various spell checking strategies for Jupyter notebooks.
pip install --upgrade git+https://github.com/innovationOUtside/nb_spellchecker.git
Also requires: pip install --upgrade git+https://github.com/ouseful-PR/pyspelling.git@th-ipynb
Install nb_spellchecker
and my extended version of pyspelling
.
Download (or copy) the ipyspell.yml
file from this repo, then change the two sources
paths (currently content/*/*.ipynb
, which is to say, .ipynb
files in subdirs of the content
directory relative to where pyspelling
is run) for the markdown and code cell file checking.
Download or create a copy of .wordlist.txt
, a file containing whitelisted words (this can be an empty file: touch .wordlist.txt
Run pyspelling
with the command: pyspelling -c ipyspell.yml > typos.txt
to generate a report in the typos.txt
file.
Generate a report of typos by notebook: nb_spellchecker reporter -r summary_report.txt
(the report will be in summary_report.txt
)
pyspelling
Using an extended version of pyspelling
, installable as pip install --upgrade git+https://github.com/ouseful-PR/pyspelling.git@th-ipynb
, we can generate reports over notebooks markdown and code cells either together, or separately.
Original pyspelling
docs here: https://facelessuser.github.io/pyspelling/
The extended version adds notebook parsing as described in Spellchecking Jupyter Notebooks with pyspelling.
pyspelling
spellcheckerUsing the ipyspell.yml
file in this repo, as well as a (possibly empty) .wordlist.txt
file, we can then run a command of the form to run spell checks over content/*/*.ipynb
:
pyspelling -c ipyspell.yml > typos.txt
You can override the specified path to particular tasks (eg check Markdown cells or Python code cells) with the -S
switch:
pyspelling -c ipyspell.yml -S "quicktest/Part*/*.ipynb" -n Markdown > typos_md.txt
pyspelling -c ipyspell.yml -S "quicktest/Part*/*.ipynb" -n Python > typos_py.txt
To generate further reports on the pyspelling
report:
# Convert report to CSV
nb_spellchecker reporter
# Specify CSV filename
nb_spellchecker reporter -c csv_typos.csv
# Generate a tabular typo summary over each notebook
nb_spellchecker reporter -r summary_report.txt
# Generate a report summarising typo counts over all notebooks
nb_spellchecker reporter -t summary_report.txt
# Generate an ordered typo wordlist over all notebooks
nb_spellchecker reporter -w typoswordlist.txt
A quick way to find repeated words is the following simple egrep
command:
egrep -o "\b(\w+)\s+\1\b" */.md/*.md
TO DO: add this in somehow...
codespell-project/codespell
: spelling error and correction in part through manually maintained lookup/correction dictionaries.
# Pre-commit hook
- repo: https://github.com/codespell-project/codespell/
rev: v2.1.0
hooks:
- id: codespell
name: codespell
description: Checks for common misspellings in text files.
entry: codespell
language: python
types: [text]