Closed simonw closed 4 months ago
Experiment using https://pypi.org/project/pyspellchecker/
>>> from spellchecker import SpellChecker
>>>
>>> spell = SpellChecker()
>>> spell.unknown("hello there".split())
set()
>>> spell.unknown("hello thereu".split())
{'thereu'}
>>> spell.unknown("hello thereu".split())
{'thereu'}
>>> spell.correction("thereu")
'there'
>>>
>>>
>>> spell.word_frequency
<spellchecker.spellchecker.WordFrequency object at 0x10154bca0>
>>> spell.word_frequency.load_words
<bound method WordFrequency.load_words of <spellchecker.spellchecker.WordFrequency object at 0x10154bca0>>
>>> spell.word_frequency.load_words("
KeyboardInterrupt
>>>
>>>
>>> spell.unknown(["willison"])
{'willison'}
>>> spell.word_frequency.load_words(["willison"])
>>> spell.unknown(["willison"])
set()
>>> spell.unknown(["willisn"])
{'willisn'}
>>> spell.correction("willison")
'willison'
>>> spell.correction("willisn")
'willison'
Then I could find all tags in my database, split on -
and add those to the custom word frequency thing. If 0 results I could run the spell checker and suggest the corrected thing along with how many results it gets.
I could populate an in-memory spellcheck with tags from the DB and keep that in memory until the server restarts, or maybe until an hour has passed.
OK, it works. It's not brilliant but...
https://simonwillison.net/search/?q=promptt+injection
You have to get lucky with regards to which spell check it chooses though:
I'm going to not bother showing the suggestion if it has zero results.
If a search returns 0 results, try spell check correcting the search terms and see if that provides results - if so suggest the fix.