guardian / typerighter

Even if you’re the right typer, couldn’t hurt to use Typerighter!
Apache License 2.0
276 stars 12 forks source link

Create live and draft rules for dictionary words on ingest #400

Closed rhystmills closed 1 year ago

rhystmills commented 1 year ago

What does this change?

This PR converts dictionary words to live and draft rules on ingest of the dictionary file.

It also provides an interface for listing dictionary rules in the rule manager, this has been designed to work well with a debounced 'search-as-you-type' search similar to the one in the existing rules table:

/api/dictionary - > List all rules, showing the first 1000 /api/dictionary?page=1 -> List all rules, showing the second 1000 (pages are zero-indexed) /api/dictionary?word=bel -> show words starting with 'bel', ordered by word similarity (based on shared trigrams between search term and result word) /api/dictionary?word=a&page=2 -> show the third batch of 1000 words starting with 'a'

image

Dictionary rules are not included in the standard list endpoint so that they don't clutter the frontend rules table.

The endpoints are paginated for performance reasons - serialising to JSON all the rows for e.g. the search 'a' would be noticeably slow.

In follow-up PRs:

How to test

  1. Make sure you've run the 'setup' script to ensure the dictionary XML file is available locally.
  2. Run the rule manager locally according to the instructions in the readme
  3. Send a request to https://manager.typerighter.local.dev-gutools.co.uk/api/refreshDictionary in order to add the dictionary rules to your local database
  4. Make some requests to /api/dictionary, e.g. /api/dictionary?word=bel. Do they behave as expected?
rhystmills commented 1 year ago

Tested locally, and it works well! A few comments / questions.

Generally, my feeling is that we should aim for a single way of querying our rules, and if it's clear that we're not going to be able to store everything in memory (which I think is where we're at with the dictionary work?), we should alter our approach to server-side search and pagination, as you do here – this PR is a neat stepping stone if we do make that decision.

Yeah, we won't be able to store everything in memory I think. The client side search has been pleasant, but I can see the argument for consistency. The LIKE 'suffix%' SELECT I'm using for dictionary rules might not be as suitable for the other rules, so we might want to rethink that if we're doing server-side search for standard rules too.

jonathonherbert commented 1 year ago

Oh – and let's make sure to either backport the fetchSize config from the following PR (https://github.com/guardian/typerighter/commit/154e0a97d245a8653eb0488de501b62f4543795d I think), or merge the two PRs together, to avoid OoMs!

rhystmills commented 1 year ago

Wise, I've add them here too