omniscale / imposm3

Imposm imports OpenStreetMap data into PostGIS
http://imposm.org/docs/imposm3/latest/
Apache License 2.0
719 stars 158 forks source link

add osm_* metadata #59

Closed ImreSamu closed 7 years ago

ImreSamu commented 9 years ago

NOT YET FINAL ... need more work ... so I closed.

olt commented 9 years ago

No need to close. Just write a comment here when it's ready for testing. It's looking good already, but I'll make a few inline comments with some hints.

ImreSamu commented 9 years ago

@olt , Next proposal! PTAL.(Please Take Another Look)

olt commented 9 years ago

It would be great if you can get the system tests running. I've just extended the documentation a bit (https://github.com/omniscale/imposm3/#system-tests). Let me know if you need help with that.

ImreSamu commented 9 years ago

@olt

Let me know if you need help with that.

quick feedback: Not critical, but original repo 'test-unit' failed in my environment ( godep go test ./... ), probably 2 file is missing:

see log: https://gist.github.com/ImreSamu/7d6858cf9c68bb501a59

olt commented 9 years ago

Ah, sorry. I pushed the missing files. Please fetch/merge with master.

ImreSamu commented 9 years ago

@olt: Can you help me ?

One more file missing ( for "Benchmark test" ) : "parser/azores.osm.pbf"

ImreSamu commented 8 years ago

status:

I am proposing the following new parameters :

tags:
  # if the osm object has only one "created_by" tag, than this options will keep this otherwise drop !
  # default: "false" ( drop )
  keep_single_createdby_tag: true

  parsemetadata:
    # Parsing metadata and create new tag from "version" metadata;  
    # the new tag name will be the parameter value ("osm_version")
    # default : empty string  -> not parsing this metadata and not add new tag to the cache.
    create_tag_from_version: osm_version

    create_tag_from_timestamp: osm_timestamp

    create_tag_from_changeset: osm_changeset

    create_tag_from_uid: osm_uid

    create_tag_from_user: osm_user

json version

    "tags": {
        "load_all": true,
        "keep_single_createdby_tag": true,
        "parsemetadata": {
              "create_tag_from_version": "osm_version",
              "create_tag_from_timestamp": "osm_timestamp",
              "create_tag_from_changeset": "osm_changeset",
              "create_tag_from_uid": "osm_uid",
              "create_tag_from_user": "osm_user"
        }
    },
olt commented 8 years ago

Thanks for your work.

I would not make the internal tag names configurable. addCommonKey is really a hack and people will corrupt their caches when they change the mapping. I suggested _version_, etc. in my first comments.

The parser package should not depend on the config package, as this makes it more difficult to split the parser package from Imposm later. Also, globals should be avoided. The checks for each tag makes the code complex. A single parseMetadata variable should be enough. Suggestion: Add a .WithMetadata(true) method to the parser and pass that bool down to the parser functions.

Regarding keep_single_createdby_tag: Do you need nodes with only created_by? You won't be able access the metadata from nodes without any tags with the current version. Is that right?

ImreSamu commented 8 years ago

Thank you for your answer. First, I would like to apologize for miscommunication my requirements,

My typical use case is not rendering, mostly OSM data analysis, and my vision is creating a first class support for this, if possible.

osm_* metadata syntax on the hstore field

I am using : osm2pgsql --hstore-all --extra-attributes , and loading all osm tags with osm metadata to the hstore field, like this:

select tags from planet_osm_point limit 1;

                                                tags                                                     
-------------------------------------------------------------------------------------------------------------
 "power"=>"tower", "osm_uid"=>"244754", "osm_user"=>"Janjko", "osm_version"=>"2", "osm_changeset"=>"7083233"
(1 row)

I have lot of sql scripts, hard coded with "osm_user".."osm_timestamp" variable names, like others : https://github.com/search?q=osm_timestamp&ref=searchresults&type=Code So I would like to re-create this osm2pgsql hstore functionality with imposm3.

My problem is now : I can not rename easily the user ... timestamp meta variable names at the hstore_tags field. :(

Storing only the selected metadata fields.

Sometimes I only need some selected metadata variables and not all. And as I know this is a typical problem, for example: only "uid" + "timestamp" information needs for old tiger import visualisation (ref.) And for minimal cache storage I would like storing only the relevant fields, and not any more.

My current ( and not perfect ) implemetation now support selecting+renaming metadata variables :

{
    "tags": {
        "load_all": true,
        "parsemetadata": {
              "create_tag_from_version": null,
              "create_tag_from_timestamp": "osm_timestamp",
              "create_tag_from_changeset": null,
              "create_tag_from_uid": "osm_uid",
              "create_tag_from_user": null
        }
    },  
    "tables": {
        "parsemetadata": {
            "fields": [
                {
                    "type": "id",
                    "name": "osm_id",
                    "key": null
                },               
                {
                    "type": "geometry",
                    "name": "geometry",
                    "key": null
                },
                {
                    "type": "hstore_tags",
                    "name": "tags",
                    "key": null
                }
            ],
            "type": "geometry",
            "type_mappings": {
                "linestrings": {
                     "highway": ["__any__"]
                }
            }
        }
    }
}
root@f082e809be9f:/go/src/github.com/omniscale# psql  -A -F","   -c "select tags from  osm_parsemetadata limit 5;" > _tags.txt
root@f082e809be9f:/go/src/github.com/omniscale# cat _tags.txt
tags
"name"=>"Boulevard d'Italie", "highway"=>"primary", "osm_uid"=>"69102", "osm_timestamp"=>"2015-03-26T16:32:39Z"
"name"=>"Avenue des Pins", "oneway"=>"no", "highway"=>"residential", "osm_uid"=>"331938", "osm_timestamp"=>"2014-03-23T20:20:01Z"
"name"=>"Avenue de Monte-Carlo", "oneway"=>"yes", "highway"=>"residential", "osm_uid"=>"378737", "osm_timestamp"=>"2012-05-01T15:06:17Z"
"name"=>"Boulevard Louis II", "highway"=>"primary", "osm_uid"=>"805403", "osm_timestamp"=>"2012-08-23T22:26:47Z"
"name"=>"Avenue de l'Annonciade", "oneway"=>"no", "highway"=>"secondary", "osm_uid"=>"852996", "maxspeed"=>"30", "osm_timestamp"=>"2013-01-10T21:14:02Z"
(5 rows)

or only a single version

    "tags": {
        "load_all": true,
        "parsemetadata": {
              "create_tag_from_version": "_version_",
        }
    },  
+ psql -A -F, -c 'select tags from  osm_parsemetadata limit 5;'
+ cat _tags.txt
tags
"name"=>"Boulevard d'Italie", "highway"=>"primary", "_version_"=>"11"
"name"=>"Avenue des Pins", "oneway"=>"no", "highway"=>"residential", "_version_"=>"7"
"name"=>"Avenue de Monte-Carlo", "oneway"=>"yes", "highway"=>"residential", "_version_"=>"11"
"name"=>"Boulevard Louis II", "highway"=>"primary", "_version_"=>"6"
"name"=>"Avenue de l'Annonciade", "oneway"=>"no", "highway"=>"secondary", "maxspeed"=>"30", "_version_"=>"7"
(5 rows)

Special mapping for Hstore fields to selecting or dropping tags

in the future i would like to create an experimental "hstore_tags_drop", and "hstore_tags_keep" mappings, for example loading only the name:* tags.

    "tags": {
        "load_all": true,
        "keep_single_createdby_tag": true,
        "parsemetadata": {
              "create_tag_from_version": "osm_version",
              "create_tag_from_timestamp": "osm_timestamp",
              "create_tag_from_changeset": null,
              "create_tag_from_uid": null,
              "create_tag_from_user": "osm_user"
        }
    },
    ....
                {
                    "args": {
                        "drop_keys": [
                            "osm_timestamp",
                            "osm_user",
                            "osm_version"
                        ]
                    },
                    "type": "hstore_tags_drop",
                    "name": "tags",
                    "key": null
                },
                {
                    "args": {
                        "keep_keys": [
                            "osm_timestamp",
                            "osm_user",
                            "osm_version"
                        ]
                    },
                    "type": "hstore_tags_keep",
                    "name": "meta_tags",
                    "key": null
                },
                {
                    "args": {
                        "keep_keys": [
                            "name:*",
                        ]
                    },
                    "type": "hstore_tags_keep",
                    "name": "name_tags",
                    "key": null
                }                

keep_single_createdby_tag

Regarding keep_single_createdby_tag: Do you need nodes with only created_by?

imho:

At the master branch implementation:

metadata from nodes without any tags

You won't be able access the metadata from nodes without any tags with the current version. Is that right?

yes, This functionality is never needed for me.

cache corruption

I would not make the internal tag names configurable. addCommonKey is really a hack and people will corrupt their caches when they change the mapping. I suggested version, etc. in my first comments.

I agree that compatibility and easy support is very important, so I will rethink my design and requirements ...

Some ideas ( quick & dirty hacks) :

parser package

The parser package should not depend on the config package, as this makes it more difficult to split >the parser package from Imposm later. Also, globals should be avoided.

ok, I will try to refactor.

The checks for each tag makes the code complex.

My reason was to storing only the selected metadata fields, and this functionality add extra complexity.

Summary

I will rethink my requirements ....

Gustry commented 8 years ago

These ideas seem very good ! I'm waiting for the merge, as I'm the author of #58. Thanks a lot for your work. Unfortunetaly, I can't help for now with Go.

About the created_by field, this key is dispearing slowly because this key is deprecated on OSM object. Are you talking about created_by on the changeset ?

ImreSamu commented 8 years ago

About the created_by field, this key is dispearing slowly because this key is deprecated on OSM object.

very slowly ... see taginfo key : http://taginfo.openstreetmap.org/keys/created_by#overview

Gustry commented 8 years ago

Yes, but you won't find new editors in this list : Potlatch2 (which has been released a long time ago), ID, Mapzen ... So it's less accurate every day.

ImreSamu commented 8 years ago

Yes, but you won't find new editors in this list ... So it's less accurate every day.

I am doing data quality analytics and I have a limited knowledge about OSM editors. For example yesterday (timestamp="2015-12-07T17:38:22Z" ) one of the user created this new (v1) node web: http://www.openstreetmap.org/node/3882156807 xml: http://www.openstreetmap.org/api/0.6/node/3882156807 with only one "created_by"="JOSM" tag

<osm version="0.6" generator="CGImap 0.4.0 (8709 thorn-02.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
<node id="3882156807" visible="true" version="1" changeset="35811285" timestamp="2015-12-07T17:38:22Z" user="coloredstone" uid="1815671" lat="46.1396752" lon="14.3995777">
<tag k="created_by" v="JOSM"/>
</node>
</osm>

if I import this data to Postgis, then as I know :

so when I compare

[1] I have started working on an experimental integration test script - for comparing imposm3 import with imposm3 diff, but I removed, because this pull request is already to big and I have some problems with comparing geometries. ( https://github.com/ImreSamu/imposm3/commit/6b51919527bdabc614f963fadd660d974b58bbdb )

Gustry commented 8 years ago

For example yesterday (timestamp="2015-12-07T17:38:22Z" ) one of the user created this new (v1) node web: http://www.openstreetmap.org/node/3882156807 xml: http://www.openstreetmap.org/api/0.6/node/3882156807 with only one "created_by"="JOSM" tag

I don't understand how it happened. The node has been updated using Mapzen and not JOSM according to the changeset. I guess it's a copy/paste tags in Mapzen.

According to the wiki :

As of November 2012 many nodes, ways or relations still have a created_by tag, but these are slowly disappearing since this key is silently dropped by OSM editors like JOSM, iD, Potlatch, Potlatch2 when objects get modified.

It works, I just moved the previous node for a few centimeters. The tag has been removed by JOSM, I even couldn't see the tag in JOSM when I downloaded the node. http://www.openstreetmap.org/node/3882156807/history

Gustry commented 8 years ago

Is-it possible to have a status about this PR ? Is there a list of tasks that need to be done ? (we may help if we can)

ImreSamu commented 8 years ago

Hi @Gustry !

Is-it possible to have a status about this PR ?

imho: It is working, but the code is not optimal. ( I am learning Golang )

in the future:

Is there a list of tasks that need to be done ? (we may help if we can)

imho: the code ready for testing. If you have time - please check the curent version. :)

Gustry commented 8 years ago

It seems great news. I'm gonna try soon ! Thanks !

Gustry commented 8 years ago

Just for information, it could be nice if you provide a mapping file like the main one https://github.com/omniscale/imposm3/blob/master/example-mapping.yml with the metadata. I'm going to try again, I had some problem the last time I tried.

ImreSamu commented 8 years ago

@Gustry

, it could be nice if you provide a mapping file like the main one

I have created some mapping files for testing. See in the /test/parsemetadata*.json :

hzhangic commented 7 years ago

Is this feature available already?

ImreSamu commented 7 years ago

@vicgarvey

Is this feature available already?

Sorry, not yet ... this is only a working prototype.

Gustry commented 7 years ago

Why this PR has been closed? Even if it's not very active, I think it's still better to have it opens, it has a better visibility

ImreSamu commented 7 years ago

Why this PR has been closed?

imho:

Maybe the refactoring is the first step:

And after that

Even if it's not very active, I think it's still better to have it opens, it has a better visibility

Your issue is still open ( https://github.com/omniscale/imposm3/issues/58 )

And I will keep my branch 'ImreSamu:osm_metadata', so if somebody wants to reuse or examine some part of the code it is possible.