Create live and draft rules for dictionary words on ingest

rhystmills commented 1 year ago

What does this change?

This PR converts dictionary words to live and draft rules on ingest of the dictionary file.

It also provides an interface for listing dictionary rules in the rule manager, this has been designed to work well with a debounced 'search-as-you-type' search similar to the one in the existing rules table:

/api/dictionary - > List all rules, showing the first 1000 /api/dictionary?page=1 -> List all rules, showing the second 1000 (pages are zero-indexed) /api/dictionary?word=bel -> show words starting with 'bel', ordered by word similarity (based on shared trigrams between search term and result word) /api/dictionary?word=a&page=2 -> show the third batch of 1000 words starting with 'a'

Dictionary rules are not included in the standard list endpoint so that they don't clutter the frontend rules table.

The endpoints are paginated for performance reasons - serialising to JSON all the rows for e.g. the search 'a' would be noticeably slow.

In follow-up PRs:

Dictionary rules will be included in the published rules artefact and ingested by the Checker service, where they will be used in a spellchecker Matcher Pool (they're not included in this PR because they're not 'active' live rules and the there's some additional work required to include them in the artefact)
We will introduce a frontend interface for listing, unpublishing and editing tags of dictionary rules

How to test

Make sure you've run the 'setup' script to ensure the dictionary XML file is available locally.
Run the rule manager locally according to the instructions in the readme
Send a request to https://manager.typerighter.local.dev-gutools.co.uk/api/refreshDictionary in order to add the dictionary rules to your local database
Make some requests to /api/dictionary, e.g. /api/dictionary?word=bel. Do they behave as expected?

rhystmills commented 1 year ago

Tested locally, and it works well! A few comments / questions.

Generally, my feeling is that we should aim for a single way of querying our rules, and if it's clear that we're not going to be able to store everything in memory (which I think is where we're at with the dictionary work?), we should alter our approach to server-side search and pagination, as you do here – this PR is a neat stepping stone if we do make that decision.

Yeah, we won't be able to store everything in memory I think. The client side search has been pleasant, but I can see the argument for consistency. The LIKE 'suffix%' SELECT I'm using for dictionary rules might not be as suitable for the other rules, so we might want to rethink that if we're doing server-side search for standard rules too.

jonathonherbert commented 1 year ago

Oh – and let's make sure to either backport the fetchSize config from the following PR (https://github.com/guardian/typerighter/commit/154e0a97d245a8653eb0488de501b62f4543795d I think), or merge the two PRs together, to avoid OoMs!

rhystmills commented 1 year ago

Wise, I've add them here too

guardian / typerighter

Create live and draft rules for dictionary words on ingest #400

What does this change?

How to test