openpotato / openplzapi.data

Raw data for the OpenPLZ API project
https://www.openplzapi.org
Open Data Commons Open Database License v1.0
22 stars 0 forks source link

mistakes and problems in database #2

Closed BugFix closed 1 month ago

BugFix commented 12 months ago

Hello, I've tried to parse this csv to insert in an sqlite database. This fails anytime. Some reasons was wrong data (your source parsing got not the full data): #208 's Obere Wegle;72820;Sonnenbühl;08415091 --> Fußweg 's Obere Wegle;72820;Sonnenbühl;08415091 #209 's Untere Wegle;72820;Sonnenbühl;08415091 --> Fußweg 's Untere Wegle;72820;Sonnenbühl;08415091 #210 's-Heerenberger Str.;46446;Emmerich am Rhein;05154008 --> Alte 's-Heerenberger Str.;46446;Emmerich am Rhein;05154008 This seems also to be wrong: #214 -4;76187;Karlsruhe;08212000 Another problem are locations like this: #364 24 Stunden 2009;78183;Hüfingen;08326027 This is not the name of a street. Its tagged as footway, name=24 Stunden 2009. And as i can see on OSM, it's part of a way (was walked in 24 hours?) Some other examples: #370 29 - 38;06184;Kabelsketal;15088150 #371 2;04758;Cavertitz;14730050 #372 2;92536;Pfreimd;09376153 There is no relationship to any street.

I'm trying since some years to get good free sources for postalcodes and regional keys. I know, its heavy to parse the sources, especially with the amount auf nearly 2 millions of street names. And so: Thumbs up for your work.

Some hints. I think it give more problems as responsed, while using ';' inside values and also as delimiter. You've encapsulated therefore the values with '"' to avoid mixing. Thats OK. But while processing the data i must search for the last '"', cut until this as 1st field and can than split the rest by ';' to get the other 3 fields. Replacing the ';' in the name-value with ',' while parsing from your sources would avoid this overhead of work.

Best Regards BugFix (autoit@bug-fix.info)

fstueber commented 12 months ago

Hi, thanks a lot for the feedback. As for the first 3 examples: That's the name how it is stored in OSM:

To solve this OSM should be updated.

fstueber commented 12 months ago

About #214. This is the entry in OSM:

<way id="428097109">
  <tag k="access" v="private"/>
  <tag k="highway" v="unclassified"/>
  <tag k="name" v="-4"/>
</way>

This clearly should not be in the result set. I will filter out access=private for the next extraction.

fstueber commented 12 months ago

About #364.

<way id="1053159284">
  <tag k="highway" v="footway"/>
  <tag k="name" v="24 Stunden 2009"/>
  <tag k="surface" v="paved"/>
</way>

This should not be in the result set, yes. I will have to look how to deal with this.

fstueber commented 12 months ago

The last 3:

<way id="398541742">
  <tag k="foot" v="yes"/>
  <tag k="highway" v="footway"/>
  <tag k="horse" v="no"/>
  <tag k="name" v="29 - 38"/>
</way>
<way id="34520686">
  <tag k="highway" v="secondary"/>
  <tag k="maxspeed" v="50"/>
  <tag k="name" v="2"/>
  <tag k="ref" v="S 27"/>
</way>
<way id="303202319">
  <tag k="highway" v="footway"/>
  <tag k="name" v="2"/>
  <tag k="surface" v="grass"/>
</way>

Some better filtering is needed.

fstueber commented 12 months ago

About your hint: That's how CSV (in this case with ; as delimiter) is formated. Or did I miss something?

BugFix commented 12 months ago
That's how CSV (in this case with ; as delimiter) is formated.

You may be right. I think it's language related, how csv libraries will work. I'm using often AutoIt to write programs and have here my own created functions. But i can live with my solution too. By the other way, I'll try how the results will be with Lua and Nim.

Thanks for your reply.

fstueber commented 12 months ago

I don't know AutoIt, but it seems it supports regular expressions which is an alternative way to deal with CSV files.

fstueber commented 1 month ago

Please check new version of OpenPLZ API: https://www.openplzapi.org/en/change-log/