pelias / schema

elasticsearch schema files and tooling
MIT License
40 stars 74 forks source link

can we remove parent.postalcode? #462

Closed missinglink closed 4 years ago

missinglink commented 4 years ago

when reviewing https://github.com/pelias/schema/pull/459 I noticed we have parent.postalcode as an admin field.

AFAIK we don't use this anywhere since we use address_parts.zip (terrible name I know) to store postalcode info.

I think when we originally designed the schema we considered postalcodes as a parent but now we consider them more of an attribute of an address.

Maybe they are being used only when importing postcodes from WOF, as a self-reference?

cc/ @orangejulius @Joxit do we use this field?

missinglink commented 4 years ago
Screenshot 2020-08-19 at 14 18 28
Screenshot 2020-08-19 at 14 18 13
missinglink commented 4 years ago

Could also be worth leaving this in for later use, like for instance if we use ZCTA boundaries or something similar to programmatically assign postcodes via PIP?

Joxit commented 4 years ago

Yes, this field is used when you are looking for a postalcode (from WOF or a custom source I guess).

With the example 75011 France the ES query uses parent.postalcode

{
    "query": {
        "function_score": {
            "query": {
                "bool": {
                    "minimum_should_match": 1,
                    "should": [
                        {
                            "bool": {
                                "_name": "fallback.postalcode",
                                "must": [
                                    {
                                        "multi_match": {
                                            "query": "75011",
                                            "type": "phrase",
                                            "fields": [
                                                "parent.postalcode"
                                            ]
                                        }
                                    },
                                    {
                                        "multi_match": {
                                            "query": "france",
                                            "type": "phrase",
                                            "fields": [
                                                "parent.country",
                                                "parent.country_a",
                                                "parent.dependency",
                                                "parent.dependency_a"
                                            ]
                                        }
                                    }
                                ],
                                "filter": {
                                    "term": {
                                        "layer": "postalcode"
                                    }
                                }
                            }
                        },
                        {
                            "bool": {
                                "_name": "fallback.dependency",
                                "must": [
                                    {
                                        "multi_match": {
                                            "query": "france",
                                            "type": "phrase",
                                            "fields": [
                                                "parent.dependency",
                                                "parent.dependency_a"
                                            ]
                                        }
                                    }
                                ],
                                "filter": {
                                    "term": {
                                        "layer": "dependency"
                                    }
                                }
                            }
                        },
                        {
                            "bool": {
                                "_name": "fallback.country",
                                "must": [
                                    {
                                        "multi_match": {
                                            "query": "france",
                                            "type": "phrase",
                                            "fields": [
                                                "parent.country",
                                                "parent.country_a"
                                            ]
                                        }
                                    }
                                ],
                                "filter": {
                                    "term": {
                                        "layer": "country"
                                    }
                                }
                            }
                        }
                    ]
                }
            },
            "max_boost": 20,
            "functions": [
                {
                    "field_value_factor": {
                        "modifier": "log1p",
                        "field": "popularity",
                        "missing": 1
                    },
                    "weight": 1
                },
                {
                    "field_value_factor": {
                        "modifier": "log1p",
                        "field": "population",
                        "missing": 1
                    },
                    "weight": 2
                }
            ],
            "score_mode": "avg",
            "boost_mode": "multiply"
        }
    },
    "sort": [
        "_score"
    ],
    "size": 20,
    "track_scores": true
}
missinglink commented 4 years ago

agh ok, fair enough ;)