pelias / model

Pelias data models
6 stars 17 forks source link

add alphanumeric postcodes post-processing script #158

Closed missinglink closed 3 months ago

missinglink commented 3 months ago
/**
 * Alphanumeric postcodes post-processing script ensures that both the expanded
 * and contracted version of alphanumeric postcodes are indexed.
 *
 * Without this script a postcode such as '1383GN' would not be matched to the
 * query '1383'.
 * 
 * The script is intended to detect these alphanumeric postcodes and index both
 * permutations, ie. '1383GN' = ['1383GN', '1383 GN'].
 * 
 * The inverse case should also be covered. ie. '1383 GN' = ['1383 GN', '1383GN'].
 * 
 * Note: the regex is currently restrictive by design, the UK for instance uses
 * alphanumeric postcodes in the format 'E81DN' which could cause error when splitting
 * with this method, they are currently ignored. Future work should consider global
 * postcode formats.
 * 
 * Note: this script is intended to run *before* the 'deduplication' post processing
 * script so that prior aliases don't generate duplicate terms.
 */