Closed rhystmills closed 1 year ago
I have run this locally and tested the phrase "bridgitte is a grownup”. "bridgitte" gets a red "Capitalization" category underline and "grownup" gets an orange "review" underline. However “is” also has a red ‘Collins Dictionary’ category underline, with suggested replacements e.g. “BS”, “CIS”, etc. I have found “is” in my local database, and have tested this phrase in CODE, where it doesn’t underline “is”, so I’m not sure what’s going on here! It may be a specific issue with my local db, if you’re not seeing this issue?
I also tested “Brigitte Trogneux”, expecting the full name to have a “Check this” underline, showing a ‘Review’ style guide rule. However, I’m getting a red ‘Collins Dictionary’ underline for “Bridgitte” and an orange Collins underline for “Trogneux” (compared to two red underlined in CODE), so it doesn’t seem to be picking up the style guide rule (I confirmed that this rule is in my local db). Are you seeing something different locally?
I have run this locally and tested the phrase "bridgitte is a grownup”. "bridgitte" gets a red "Capitalization" category underline and "grownup" gets an orange "review" underline. However “is” also has a red ‘Collins Dictionary’ category underline, with suggested replacements e.g. “BS”, “CIS”, etc. I have found “is” in my local database, and have tested this phrase in CODE, where it doesn’t underline “is”, so I’m not sure what’s going on here! It may be a specific issue with my local db, if you’re not seeing this issue?
I also tested “Brigitte Trogneux”, expecting the full name to have a “Check this” underline, showing a ‘Review’ style guide rule. However, I’m getting a red ‘Collins Dictionary’ underline for “Bridgitte” and an orange Collins underline for “Trogneux” (compared to two red underlined in CODE), so it doesn’t seem to be picking up the style guide rule (I confirmed that this rule is in my local db). Are you seeing something different locally?
I had this at one point today (on a different branch) but re-running the services resolved everything, think it might be to do with the checker service getting in an odd state, but haven't been able to replicate since. Might be worth testing again as I've made some more changes.
Just tested again - bridgitte is a grownup
is now giving expected results (capitalization for 'bridgitte' and check this for 'grownup'). However I’m now getting a red ‘Collins Dictionary’ underline for both 'Bridgitte' and for 'Trogneux'. I've double checked I'm running your branch and have pulled your latest changes.
I think that's expected behaviour for now, though we don't want it to be once NER (named entity recognition) is sharpened a bit. Bridgitte and Trogneux are not currently in the dictionary so we'd expect a red underline unless the NER catches them. Lower case capitalisation rule has higher priority than dictionary so you don't see that 'typo' highlighted until it's capitalised - that's actually an indication that priority is working as we'd like it to.
(edit: Samantha reminded me that 'Brigitte Trogneux' was a styleguide rule and should have shown up - I had inserted a d into 'Bridgitte' above so it wasn't being triggered)
To capture the conversation on chat, 'Brigitte Trogneux' is being caught by the style guide rule, however I'd been testing it with a 'd', i.e. Bridgitte, hence it was being caught by the dictionary. All is working as expected! 🎉
Currently, style guide (regex) rules are removed in favour of dictionary rules when they overlap.
For example - let's say the text "Brigitte Trogneux" appears in an article, and the following are true:
In this case, we would want "Bridgitte Trogneux" to appear with an orange 'Review' underline because of the style guide rule, but it would currently get a red 'Amend' from the dictionary matcher. This is because we are effectively prioritising by category id in reverse alphabetical order, which hasn't been a big issue until now because clashes between LanguageTool rules and style guide rules were not very common.
We want to make sure that Style Guide rules trump Dictionary rules.
Note: I've removed a test checking that rules are ordered by category, as it's no longer the case, and added a test elsewhere for overlapping rule removal where priority is specified.
What does this change?
This PR adds an explicit priority to
CheckerRule
instances which can be used to deal with ranges of text that match more than one matcher type. Priority is associated with the rule type - specifically:LanguageTool rules are prioritised above Style Guide (regex) rules, which are prioritised above Dictionary rules.
A bit of an oddity - Dictionary rules appear as
LtRule
when they are used after being extracted from the LanguageTool instance that contains them (part of the Dictionary Matcher) so we identify them there via their unique ID.How to test
~/.gu/flexible-composerbackend.properties
in order to uselocal
Typerighter Checker rather thanCODE
:typerighter.url=https://checker.typerighter.local.dev-gutools.co.uk