protomaps / OSMExpress

Fast database file format for OpenStreetMap
BSD 2-Clause "Simplified" License
229 stars 19 forks source link

Cannot extract node tags #35

Closed jpolchlo closed 3 years ago

jpolchlo commented 3 years ago

Hi. New to the project, trying to pull some tag information for a node, and hitting a wall. I failed to find any examples to help me along, so I'm going to describe the problem, and hope that there is a simple fix.

This problem arises from the augmented diff construction process, and the discovery that in node deletion, an overpass-generated diff shows tag information for a node that is being deleted, but diffs generated by https://github.com/azavea/onramp have an empty tag field. I noticed that this block doesn't attempt to read in the tags, as is the case for ways and relations. (This is relevant to OSMX, I swear.)

I went to add node tags to this module, but ran into problems finding these tags. Given a known node id, the following

node_id = ######
env = osmx.Environment(osmx_filename)
with osmx.Transaction(env) as txn:
    nodes = osmx.Nodes(txn)
    nd = nodes.get(node_id)
    print(nd.tags)

fails because nd is None, though the location does exist.

I must be misunderstanding in a basic way, and could use a pointer or two.

Thanks!

bdon commented 3 years ago

Hi,

  1. There are two tables, one called nodes and one called locations. Lat,Long,and Version are stored in locations, and there will only be a corresponding entry in nodes for the ID if there are tags. So if a node has no tags, it will only appear in locations. This may not be relevant in this situation.
  2. Can you provide the node ID? If your database is replicating the public stream, it would be helpful for me to try to reproduce this. Maybe it's just a matter of the python code missing logic to add that information to the diff.
bdon commented 3 years ago

@jpolchlo looking over this a bit more, it may just be as simple as https://github.com/azavea/onramp/blob/master/app/onramp/diff.py#L101-L102 checking if o exists first, e.g.:

o = nodes.get(elem_id)
if o:
  it = iter(o.tags)
  for t in it:
    tag = ET.SubElement(elem, "tag")
    tag.set("k", t)
    tag.set("v", next(it))

The majority of nodes (~70%) have no tags, so nd will be none. in the context of onramp, the empty tag field should be correct behavior. but it would need to be modified as above to include tags for nodes that are tagged.

jpolchlo commented 3 years ago

Sadly, this is a private deployment, and I cannot share the contents, but I'll give you a redacted form of the diff:

{
    "type":"FeatureCollection",
    "id":"delete",
    "features":[
        {
            "type":"Feature",
            "id":"old",
            "properties":{
                "changeset":"2345678",
                "id":"1234567890",
                "tags":{
                    "building":"greenhouse",
                    "condition":"intact",
                    ...
                },
                "timestamp":"...",
                "type":"node",
                "uid":"###",
                "user":"...",
                "version":"1",
                "visible":true
            },
            "geometry":{
                "type":"Point",
                "coordinates":[
                    ...
                ]
            }
        },
        {
            "type":"Feature",
            "id":"new",
            "properties":{
                "changeset":"3456789",
                "id":"1234567890",
                "tags":{
                    "building":"greenhouse",
                    "condition":"intact",
                    ...
                },
                "timestamp":"...",
                "type":"node",
                "uid":"###",
                "user":"...",
                "version":"2",
                "visible":false,
                "augmentedDiff":#####
            },
            "geometry":{
                "type":"Point",
                "coordinates":[
                    ...
                ]
            }
        }
    ]
}

That's from Overpass. The thing to notice is that the node appears here with non-empty tags. The Onramp-generated augmented diff record has no tags. If I attempt to query for the node ID (1234567890 in the above) using osmx, I can find the matching location, but I get a None response from the Nodes.get query, which is confusing.

Am I thinking about this wrong? Shouldn't the tags in Overpass be represented in the osmx database?

jpolchlo commented 3 years ago

Oh, rats. It occurred to me that because the entity I'm querying was deleted, the tags would no longer be present in the OSMX DB at the time I'm querying it. The solution you posted should work.

image

Will re-open if I'm wrong.

bdon commented 3 years ago

That definitely seems odd. In the OSMX database, the locations table has a 3rd number which is the version number. Can you compare that to overpass?

bdon commented 3 years ago

Oh, rats. It occurred to me that because the entity I'm querying was deleted, the tags would no longer be present in the OSMX DB at the time I'm querying it. The solution you posted should work.

image

Will re-open if I'm wrong.

I thought of that too, but if that was the case I believe the location entry should have been deleted, because OSMX doesn't store historical data (overpass does)

jpolchlo commented 3 years ago

Right. Got my memory mixed up. The locations were from a different example. Indeed the locations for the deleted node are deleted as well.

bdon commented 3 years ago

Okay, let me know what you find out; Onramp exercises more code paths than my own production system so I'm keen to get to the bottom of any missing data, etc.