Closed darrell closed 10 years ago
Hey @darrell, I'm curious what the imports lists had to say about this idea. Has this random distribution of points been tried in any other imports?
I'm trying to figure out how to handle the same problem in the LA buildings import: we sometimes have multiple addresses in the same building that all have the same coordinates. They are literally on top of each other. In already stripping out unit numbers where I can, but there are still many overlapping addresses that remain.
I was thinking about just leaving them as-is, since they will trigger validation errors in JOSM and then the person doing the manual import will have to deal with them intelligently.
Sorry I didn't much of a chance to talk to you this week. :(
I don't think anyone commented on this piece specifically, so i assume silence implies consent. :+1:
The main purpose was to avoid things being flagged as duplicate nodes where we have several addresses all at the same point.
Here's the code that does it. Input is a point, output is a slightly different point. :)
CREATE or REPLACE FUNCTION perturb_point(pt geometry) returns geometry AS $$
DECLARE
srid integer;
offset_x double precision;
offset_y double precision;
BEGIN
offset_y:=random()*0.00001;
offset_x:=random()*0.00001;
srid:=st_srid(pt);
pt:=st_setsrid(st_makepoint(st_x(pt)+offset_x, st_y(pt)+offset_y), srid);
RETURN pt;
END;
$$ language plpgsql;
Oh, so they're not evenly distributed within the building polygon, they're just jittered a tiny bit around their original point?
Would you add a FIXME tag to those nodes perhaps? I'm just worried that it would be hard to detect that there was something there that needed human oversight. The only way we'd notice these nearly-overlapping points in the future is if we saw a bunch of overlapping house numbers in the default renderer at high zooms.
It's tough to say. We're specifically avoiding the many building to many address issue in our first pass, so the number of addresses we typically have on a building is rarely more than three or four and usually two. It's only a small subset of the set of properties that are likely to have multiple addresses. We've been double checking the address points manually so far anyway, so I just re-arrange them a bit when I come across them.
In that sense, it's not much different than what you're talking about with letting the JOSM validator flag them. Having them all slightly different does make them easier to select in JOSM, though. :)
I've actually considered doing something like figuring out what way the numbers are running on that street and then sorting and putting them all in distributed line along the axis of the building, but that's a lot more work. :)
...yeah distributing parallel to the road is too hard! Especially because several of the examples I've seen in LA are corner buildings that have addresses on two different streets.
I hadn't thought about the difficulty of selecting the nodes in JOSM. So maybe some combination of jittering plus FIXME tags is the best way to go?
I don't think we should be automatically adding fixme tags to each Node for the case of multiple addresses within the same building. fixme on the OSM wiki also recommends against any type of automated editing generating fixme tags.
I do plan on reviewing and possibly visiting in person every location in the Portland metro area where there are OSM nodes with addr tags to see if these can be merged with a building outline. These Nodes can be easily found with using overpass turbo.
I was envisioning the FIXME tags as flags for person doing the import to catch. Ideally, the manual importer would remove those tags before the data is even uploaded to the main OSM database.
But I realize there's no way we could guarantee that none of those FIXMEs made it into OSM. Perhaps that's something we could focus on during the validation step in the tasking manager?
@dobratzp, I apologize, I didn't read your comment closely enough. And I don't think I explained myself well enough, because I'm thinking about this in the context of the LA Buildings import, not the PDX one, where the situation might be slightly different.
I'm not suggesting we add fixme tags to all situations where there are multiple addresses within the same building. In LA at least, we have many cases where there are several addresses within a single building, and they are well-spaced within the building. So that follows standard OSM practice and doesn't need fixing. I suggest we use fixme tags only when there is more than one address on the same point which we have nudged slightly apart from each other in our processing scripts. In my case, these overlapping situations are much less common than the general case of multiple addresses within a building. But I don't know if PDX is in the same situation. When you have multiple addresses within a building, are they always on the same spot, or rarely on the same spot?
Upon further reading of key:fixme on the wiki, it also says that fixme should not be used for things that can be automatically detected by other error detection tools. I think the approach that @darrell proposes (a bunch of almost-overlapping address nodes) would not be detected by other error detection tools. As you say, it would be detected by overpass turbo, but so would all the other address nodes, even if they aren't nearly overlapping another node, and therefore not requiring the same level of oversight.
I think using fixme as just a flag to JOSM to let the importer know that they need to manually review those Nodes and change them before uploading is reasonable. I had assumed that you were suggesting uploading the fixme tags in the hopes of someone coming by and cleaning up the data later. As part of our manual review step, we try to avoid introducing any new JOSM validator warnings before uploading the data to OSM.
As far as I know, the method that @darrell came up with to use a cluster of Nodes will not trigger warnings or errors in any QA tools that I am aware of. However, I do plan on manually reviewing all Nodes with addr tags. Some of these will be from the PDX buildings import. Others are from another import, many of which only have the addr:state tag and none of the other addr tags. And there are a bunch of Nodes that have been added by other mappers. I think it's worth reviewing all of these to see if they can be cleaned up. Where possible, I like to split buildings up and put the addr tags on each section of the building. Two common cases are a townhouses and shops within shopping centers.
I'm thinking that randomly distributing points inside the building's polygon would work:
http://gis.stackexchange.com/questions/89954/how-to-create-random-points-in-a-polygon-in-postgis
This still leaves unresolved the issue of many buildings to many addresses, but gets us closer.