pelias / parser

natural language classification engine for geocoding
https://parser.demo.geocode.earth
MIT License
55 stars 27 forks source link

chore: clean up wof dictionary generation code #132

Open missinglink opened 3 years ago

missinglink commented 3 years ago

I attempted to update the WOF resources today but lost enthusiasm half way through due to a bunch of different changes popping up.

This PR makes subsequent attempts at updating WOF considerably easier:

find resources/whosonfirst/dictionaries -type f -name '*.txt' \
  | node -e 'const fs=require(`fs`); fs.readFileSync(0, `utf-8`).trim().split(`\n`).forEach(file => fs.writeFileSync(file, fs.readFileSync(file, `utf-8`).trim().split(`\n`).sort().join(`\n`)))'

this should be a no-op, it's only sorting existing dictionaries, not adding or removing from them.

missinglink commented 3 years ago
Screenshot 2021-02-23 at 20 48 25

đŸ˜†

Joxit commented 3 years ago

Should we also normalize names here ? I found some name with uppercase and accents like Épinay even if LOWER is used in the SQLites statement (thank you French localities :sweat_smile:).

grep  'Épinay' resources/whosonfirst/dictionaries/locality/name\:fra_x_preferred.txt 

Second thought : Adding normalization will not improve diff reading, so I will merge as-is.