Open bartek5186 opened 5 years ago
Hi @bartek5186, can you confirm this is the case globally or is it only better in Poland?
I have worked on PL, CZ, DE... I don't notice problem in other countries yet.
There are also a postals with bad postalcode, and bad position for example:
EDIT: You can obtain lat and lng of parent location of specific zip code via parser/findbyid?ids=101841989&lang=pol
I have found some entries in postal-codes database from wof that have incomplete data and it produces inconsistency on searchs. This is an example from ES:
{"id":554829649,"type":"Feature","properties":{"edtf:cessation":"uuuu","edtf:inception":"uuuu","geom:area":0,"geom:bbox":"0.0,0.0,0.0,0.0","geom:latitude":0,"geom:longitude":0,"gp:parent_id":"12602116","iso:country":"ES","mz:hierarchy_label":1,"src:geom":"geoplanet","wof:belongsto":[],"wof:breaches":[],"wof:concordances":{"gp:id":"22664266"},"wof:country":"ES","wof:geomhash":"fc4d4085e55d16b479f231dbf54d3cfb","wof:hierarchy":[],"wof:id":554829649,"wof:lastmodified":1474569770,"wof:name":"09151","wof:parent_id":-1,"wof:placetype":"postalcode","wof:repo":"whosonfirst-data-postalcode-es","wof:superseded_by":[],"wof:supersedes":[],"wof:tags":[]},"bbox":[0,0,0,0],"geometry":{"coordinates":[0,0],"type":"Point"}}
It is even difficult when you manage to search a postalcode that is the same in other country. Then you get the info about the other country and not from Spain.
The WOF dataset contains a lot of those 0,0
postcodes, I believe the WOF team leave them as placeholders for when the correct coordinates become available.
Pelias should not import null island places, so those 0,0
records you pasted will not enter the search index, if you see results with a location of 0,0
in the index then it's a bug.
I had a quick look at this today and opened up https://github.com/whosonfirst-data/whosonfirst-data-postalcode-pl/issues/1 to discuss with the WOF team.
@bartek5186 I pulled down http://www.geonames.org/export/zip/PL.zip to have a look and I'm not sure the data is very good quality? The coordinates appear to be duplicated and rounded to two decimal points of precision in many cases.
Could you please confirm that the data is actually correct for Poland before we continue?
head PL.txt
PL 00-001 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 00-002 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 00-003 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 00-004 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 00-005 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 00-006 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 00-007 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 00-008 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 00-009 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 00-010 Warszawa Mazowieckie Warszawa 52.25 21 4
head -n1000 PL.txt | tail
PL 01-193 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 01-194 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 01-195 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 01-196 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 01-197 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 01-198 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 01-199 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 01-201 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 01-202 Warszawa Mazowieckie Warszawa 52.25 21 4
PL 01-203 Warszawa Mazowieckie Warszawa 52.25 21 4
head -n5000 PL.txt | tail
PL 10-537 Olsztyn Warmińsko-Mazurskie Olsztyn 53.7833 20.4833 4
PL 10-538 Olsztyn Warmińsko-Mazurskie Olsztyn 53.7833 20.4833 4
PL 10-539 Olsztyn Warmińsko-Mazurskie Olsztyn 53.7833 20.4833 4
PL 10-540 Olsztyn Warmińsko-Mazurskie Olsztyn 53.7833 20.4833 4
PL 10-541 Olsztyn Warmińsko-Mazurskie Olsztyn 53.7833 20.4833 4
PL 10-542 Olsztyn Warmińsko-Mazurskie Olsztyn 53.7833 20.4833 4
PL 10-543 Olsztyn Warmińsko-Mazurskie Olsztyn 53.7833 20.4833 4
PL 10-544 Olsztyn Warmińsko-Mazurskie Olsztyn 53.7833 20.4833 4
PL 10-545 Olsztyn Warmińsko-Mazurskie Olsztyn 53.7833 20.4833 4
PL 10-546 Olsztyn Warmińsko-Mazurskie Olsztyn 53.7833 20.4833 4
head -n10000 PL.txt | tail
PL 40-094 Katowice Śląskie Katowice 50.2667 19.0167 4
PL 40-095 Katowice Śląskie Katowice 50.2667 19.0167 4
PL 40-096 Katowice Śląskie Katowice 50.2667 19.0167 4
PL 40-097 Katowice Śląskie Katowice 50.2667 19.0167 4
PL 40-098 Katowice Śląskie Katowice 50.2667 19.0167 4
PL 40-100 Katowice Śląskie Katowice 50.2667 19.0167 4
PL 40-101 Katowice Śląskie Katowice 50.2667 19.0167 4
PL 40-102 Katowice Śląskie Katowice 50.2667 19.0167 4
PL 40-103 Katowice Śląskie Katowice 50.2667 19.0167 4
PL 40-104 Katowice Śląskie Katowice 50.2667 19.0167 4
You are right with 0,0 coordiantes, because at the init steps I didn't find this postalcodes but I have update the geometry info quering from geonames.
I really have not many problems with coordinates. I am working with ES postalcodes, not PL. For now I update the coordinates in wof postalcodes-es with the coordinates in Geonames (I really need to find this postalcodes). The worst thing in this data is that too many postalcodes doesn't have the hierarchy in geojson, this field appears empty, and have the same issue with belongsto I updated this data manually, searching in admin-es the hierarchy in the cases the postalcode have a parent_id, again I can complete it with the help of geojson.
Also, me and my team have problems with postalcodes that doesn't exists in whosonfirst but are registered and exists in Spain, some of them are in geonames. Now I have build my index with the wof-spain data updated by myself and geonames. The postalcodes that have now fixed the hierarchy appears in searchs, with the locality, localadmin, region... corrected, with the original data from wof this doesn't happen, The bad thing is that I can't find the postalcodes from geonames that doesn't exist in wof, and we need it for our work.
Is any way in which we can update it and also fix the hierarchy of the postalcodes I have to update manually?
Could you please confirm that the data is actually correct for Poland before we continue? I'm not sure the data is very good quality?
In Poland, some of bigger cities have multiple postal codes (based on for example streets, zones, offices or districts). So this dataset have poor/low quality without any detailed LatLon position.
For Example in Poland, There are postal codes conneted with for example streets - so there are possibility to make high quality database.
Poland PNA (postal codes) dataset are there: https://www.poczta-polska.pl/hermes/uploads/2013/11/spispna.pdf There are no LatLng position, but... there are address name for example: Located for example there: 52°14'00.5"N 20°58'37.9"E 52.233480, 20.977189
Not in: 52.21, 21
This simple LatLng looks like high level container for bigger city like "Warszawa"
for NL country geonames is also way better, wof data is 4 years out of date and incomplete
geonames is updated daily from official government sources unfortunately it can't be imported into pelias
Which is the official source that geonames uses? You might be better off just using the csv-importer to import those files directly.
We've found the Geonames postcodes files to be mixed bag, generally not very good, NL might be an exception.
For the dutch data it uses https://www.cbs.nl (Statistics Netherlands) and www.kadaster.nl (The Netherlands’ Cadastre, Land Registry and Mapping Agency) which are both officially related (fuly or partially) to our government.
we succesfully used the csv-importer for that dataset, thanks for the heads-up, I didn't know there was a csv-importer :)
I also take cvs-importer to this action, and this works great. Build-in postalcodes in this case (Europe) are useless. I have imported all custom prepared Europe region via csv-importer. The data of postal codes was prepared from official sources, and manually revisioned. I noticed little bug in importer. Imported data are named csv:postalcode, but should be named bdp:postalcode (because i set layer name source to "bdp" in importer config file, that was ignored during csv import and name in the output is csv).
Because I need autocomplete to work with postalcodes too. I was put into name_iso multiple codes.
the import file looks like that:
source,popularity,layer,id,lat,lon,name,postalcode,country,name_jso
bdp,100,postalcode,71ff447b-972b-4f7d-a8c1-e0c8c02a1a19,53.468958363988,18.760770296251,Grudziądz,86-300,PL,"[""86-300"", "" 86-301"", "" 86-302"", "" 86-303"", "" 86-304"", "" 86-305"", "" 86-306"", "" 86-307"", "" 86-308"", "" 86-309"", "" 86-310"", "" 86-311""]"
Searching work great with multiple codes in name_iso and in output i have postalocode from column postalcode.
Output:
Hi @bartek5186 I had a quick look at the issue you reported and I wasn't able to reproduce the error where the source
you provide is not the same as the source
of the document.
We actually have a testcase here which ensures that functionality works as expected.
If you're able to reproduce this could you please open a ticket.
Hi @bartek5186 I had a quick look at the issue you reported and I wasn't able to reproduce the error where the
source
you provide is not the same as thesource
of the document.We actually have a testcase here which ensures that functionality works as expected.
If you're able to reproduce this could you please open a ticket.
I have already done that before.. https://github.com/pelias/csv-importer/issues/89
Why Pelias don't use better quality postal codes from GEONAMES ??? http://www.geonames.org/export/zip/
WoF has weak source of Postal Codes database. https://github.com/whosonfirst-data/whosonfirst-data/issues/1584