pelias / schema

elasticsearch schema files and tooling
MIT License
40 stars 76 forks source link

add synonyms linter #447

Closed missinglink closed 4 years ago

missinglink commented 4 years ago

This PR (extracted from https://github.com/pelias/schema/pull/446) adds a linter which is designed to help users avoid making common mistakes when adding synonyms.

 * The synonyms linter attempts to warn the user when making
 * common mistakes with synonyms.
 *
 * Warnings:
 *  - Puntuation: Synonyms should not contain characters in the punctuation blacklist
 *  - Letter Casing: Synonyms should be lowercase
 *  - Sanity Checks: At least one synonym should exist, duplicates should be removed
 *  - Multi Word: Multi-word synonyms can generate unexpected token positions

As part of this I noticed that some of the existing synonyms contained punctuation in the blacklist, these synonyms would surely never match and so can be removed.

I've added a multiWordCheck linter which is currently commented out, I didn't want to cross that bridge yet.

orangejulius commented 4 years ago

Nice, this is really awesome!

missinglink commented 4 years ago

Cool, I think i'll merge this today.

@orangejulius one thing I would consider changing before merging are the log messages. I'd like to make sure that for users who specify custom synonyms that the warnings make sense and are actionable without having to open issues.

What do you think? Do you think the warnings might be misinterpreted as errors?