A reminder before merging: we need to ensure these rules are added to the PROD sheet as this PR is merged – but not before, as the new rules will cause parsing errors for older builds.
What does this change?
Add LanguageTool default rules to persisted rules. They can be added to the Google sheet as with other rules. The only two cells needed are 'Rule type', which should be lt_core, and ID.
Dev notes
There are a couple of decisions in this PR:
I've used core rather than default to describe the internally defined LanguageTool rules, as I think it better describes what these rules are. Looking back, maybe this isn't such a big deal!
These rules are modelled in the database (rather than, for example, hardcoded in VCS). My reasoning here is that editorial should be able to search for these rules, see them described, and turn them on and off at will – as with any other rule.
They're modelled as the other two rule types we currently support are modelled – using the 'one table per inheritance hierarchy' method. There's a good introduction to the alternatives here. These rules only need to be described by a ruleType and a googleSheetId at present, so they leave behind a sparse record. I'm keen to avoid the complexity of the other two modelling approaches if possible, as I don't think the additional safety is worth the implementation cost, but this is something other devs might reasonably disagree with – let me know if you think another approach might be better.
How to test
The tests should pass. They include a new instance of this rule, and an additional check that we're removing existing rules from the DB (this was not previously the case – 2615773eee33cca83a9c413b527db378d90a3e78)
Running locally or in CODE, add the following data to the sheet, and reingest the rules. The following copy should trigger each rule, with the exception of the paragraph end rule, which I think we can probably safely remove (the LanguageTool matcher cannot reason across paragraphs just yet.)
Rules
|Rule type| ID |
|--|--|
| lt_core | COMMA_PARENTHESIS_WHITESPACE |
| lt_core | DOUBLE_PUNCTUATION |
| lt_core | UPPERCASE_SENTENCE_START |
| lt_core | WHITESPACE_RULE |
| lt_core | SENTENCE_WHITESPACE |
| lt_core | PUNCTUATION_PARAGRAPH_END |
| lt_core | NO_SPACE_CLOSING_QUOTE |
| lt_core | PUBIC_X |
| lt_core | IN_PRINCIPAL |
| lt_core | CURRENCY_SPACE |
Copy
COMMA_PARENTHESIS_WHITESPACE
Use of whitespace before comma and before/after parentheses ,
DOUBLE_PUNCTUATION
Use of two consecutive dots or commas .. ,,
UPPERCASE_SENTENCE_START
checks that a sentence starts with an uppercase letter
WHITESPACE_RULE
Whitespace repetition (bad formatting)
SENTENCE_WHITESPACE
Missing space.Between sentences.
PUNCTUATION_PARAGRAPH_END
No punctuation mark at the end of paragraph. It seems that another para is fine.
NO_SPACE_CLOSING_QUOTE
“An example of this”and another thing
PUBIC_X
Commonly Confused Words. Pubic education etc.
IN_PRINCIPAL
In principal, the principle was in his office.
CURRENCY_SPACE
Whitespace after currency symbols: ‘$ 100’ ($100)
A reminder before merging: we need to ensure these rules are added to the PROD sheet as this PR is merged – but not before, as the new rules will cause parsing errors for older builds.
What does this change?
Add LanguageTool default rules to persisted rules. They can be added to the Google sheet as with other rules. The only two cells needed are 'Rule type', which should be
lt_core
, and ID.Dev notes
There are a couple of decisions in this PR:
core
rather thandefault
to describe the internally defined LanguageTool rules, as I think it better describes what these rules are. Looking back, maybe this isn't such a big deal!ruleType
and agoogleSheetId
at present, so they leave behind a sparse record. I'm keen to avoid the complexity of the other two modelling approaches if possible, as I don't think the additional safety is worth the implementation cost, but this is something other devs might reasonably disagree with – let me know if you think another approach might be better.How to test
Rules
|Rule type| ID | |--|--| | lt_core | COMMA_PARENTHESIS_WHITESPACE | | lt_core | DOUBLE_PUNCTUATION | | lt_core | UPPERCASE_SENTENCE_START | | lt_core | WHITESPACE_RULE | | lt_core | SENTENCE_WHITESPACE | | lt_core | PUNCTUATION_PARAGRAPH_END | | lt_core | NO_SPACE_CLOSING_QUOTE | | lt_core | PUBIC_X | | lt_core | IN_PRINCIPAL | | lt_core | CURRENCY_SPACE |Copy
COMMA_PARENTHESIS_WHITESPACE Use of whitespace before comma and before/after parentheses , DOUBLE_PUNCTUATION Use of two consecutive dots or commas .. ,, UPPERCASE_SENTENCE_START checks that a sentence starts with an uppercase letter WHITESPACE_RULE Whitespace repetition (bad formatting) SENTENCE_WHITESPACE Missing space.Between sentences. PUNCTUATION_PARAGRAPH_END No punctuation mark at the end of paragraph. It seems that another para is fine. NO_SPACE_CLOSING_QUOTE “An example of this”and another thing PUBIC_X Commonly Confused Words. Pubic education etc. IN_PRINCIPAL In principal, the principle was in his office. CURRENCY_SPACE Whitespace after currency symbols: ‘$ 100’ ($100)Test article in CODE.
The result should look something like: