hkarl / mw2pdf

Convert Mediawiki to PDF via pandoc and latex, including UML conversion
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Spell checker integration #6

Closed mpeuster closed 8 years ago

mpeuster commented 8 years ago

During the review meeting an issue with one of the deliverables came up: It has lots of typos!

To prevent such problems in the feature, we might integrate a spell checker in the PDF generator.

Pedro already provided a link to this nice library: https://github.com/blatinier/pyhunspell

The idea would be:

Optional ideas:

paaguti commented 8 years ago

Maybe include there selection of the base dictionary (e.g. en-GB vs. en-US)

El 22 sept. 2016 11:15, "Manuel Peuster" notifications@github.com escribió:

During the review meeting an issue with one of the deliverables came up: It has lots of typos!

To prevent such problems in the feature, we might integrate a spell checker in the PDF generator.

Pedro already provided a link to this nice library: https://github.com/blatinier/pyhunspell

The idea would be:

  • check spelling during PDF generation
  • mark words with wrong spelling (RED / red line)
  • add field to document options (spell_check: On / Off)

Optional ideas:

  • add a watermark to documents which were generated with activated spell checking (to prevent accidental submission of these drafts)
  • add a wikipage "Dictionary" to which everybody can add words that should be in the spell checker dictionary

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hkarl/mw2pdf/issues/6, or mute the thread https://github.com/notifications/unsubscribe-auth/ADKqcF-J_8XsSQ8BRqcDeA-hbNAgEBpxks5qskcrgaJpZM4KDre5 .

mpeuster commented 8 years ago

Played around with spell checking. Learned that its not as trivial as I thought ;-)

  1. Working on the Latex files isn't possible. They are too complex and you destroy them if you let the spell checker apply changes to highlight mistakes.
  2. Directly manipulating PDFs is no fun.
  3. Marking mistakes in the *.md files is still tricky. Using format changes like <strike> break Latex because spelling mistakes might occur everywhere, e.g., in section headings. Using ??word?? to mark mistakes seems to be reasonable safe!

Commit b4e9a49d5b25d804843db1cfad9b99001bc507d5 provides a first working version without much features.

Use: run build.py with --spellcheck to enable it.

All mistakes are marked like: ??bad-word?? in the output PDF.

Note: Its not activated on the production system yet.