Include dictionary rules in json artefact used by Checker

What does this change?

This PR includes dictionary rules in the published artefact ingested by the Checker service.

It also includes some fixes to problems caused by the much larger number of rules now included in the Rule Manager.

Adds .fetchSize(1000) call to some database methods that select a very large number of rows. According to the scalikeJDBC docs:

the PostgreSQL JDBC driver does infinite(!) caching for result sets if fetchSize is set to 0 (the default) and this causes memory problems.

Setting an explicit .fetchSize solved the outOfMemory errors we encountered in these cases. Testing sizes of 100, 1000, 10,000 and 100,000 rows led to no significant performance differences.
We also saw an outOfMemory error for ruleJson.toString.getBytes(java.nio.charset.StandardCharsets.UTF_8.name). Avoiding the intermediate toString step by using Json.toBytes(ruleJson) resolved the error. This highlights the increased possibility of previously acceptable inefficiencies leading to problems now that we are handling great deal more rules.

Separately, we encountered an odd issue where a duplicated word in the dictionary ended up with an empty string as its pattern in our live table. We didn't get to the bottom of the mechanism behind this, but added a words.distinct.filterNot(_ == "") filter to our word list to resolve the problem.

How to test

Run the application locally according to the instructions in the readme. Make sure you run the setup script to pull the dictionary xml files locally.
Hit the /api/refreshDictionary endpoint with a POST request (e.g. in Postman, with cookies from a valid browser request)

Check the artefact in the your localstack instance, e.g. with these commands run in the Docker localstack_main container CLI to pull, pretty print, and find rows containing 'DictionaryRule':

awslocal s3 cp s3://typerighter-app-local/local/rules/typerighter-rules.json /etc
sed 's/},{/},\n{/g' /etc/typerighter-rules.json > /etc/parsed.json
grep -hnr "DictionaryRule" /etc/parsed.json

Do dictionary rules appear in the artefact?

guardian / typerighter

Include dictionary rules in json artefact used by Checker #405

What does this change?

How to test