Manually match power line data

bwitham commented 6 years ago

Ideally half a dozen datasets, 200 matches each

bwitham commented 6 years ago

Proposed regression test conflate combos (mgcp pending):

bay area

mgcp/ca state mgcp/osm ~~ca state/osm~~ - DONE - 226 matches

los angeles

mgcp/ca state mgcp/osm ~~ca state/osm~~ - DONE - 200 matches

mozambique

mgcp/.info mgcp/osm ~~.info/osm~~ - DONE - 61 matches

namibia

mgcp/.info mgcp/osm ~~.info/osm~~ - DONE - 200 matches

If I end up getting all the MGCP, then I may drop the third list item in each set if it hasn't been matched yet. Also, for bay area and la, possibly will add in EIA later.

bwitham commented 6 years ago

Made some matches today between the public datasets as I'm waiting on MGCP. As expected, its fairly challenging matching due to the different ways power lines are mapped in the different datasets. In some cases, lines get mapped as a single line and an attribute is added to indicate there are multiple cables associated with that single line. In other cases, it seems each cable may be drawn individually. Sometimes I can verify the ground truth from imagery and sometimes I can't, as the lines are hard to make out. Seeing the lines connected to towers does help to a degree.

As lines come into power substations, the complexity increases rapidly and it seems in some cases, different datasets are largely at disagreement with each other.

One thing that's very key to making these matches is having the voltage attribution on the data, which prevents mismatching when lines of different voltage types are very close to each other. Luckily, most of the data I've seen so far has voltage attribution.

https://github.com/ngageoint/hootenanny/wiki/Power-Line-Notes

bwitham commented 6 years ago

Decided to skip matching the EIA data to anything for now. The EIA is a little more detailed than the other datasets in that it seems to be mapping cables on the same lines separately. I'm not really sure the best way to handle that data yet. Technically, it has more information available since it breaks details out per cable, but that kind of mapping is a little different than how OSM and other seem to be handling things, not to mention its going to be harder to conflate. I may come back to EIA after the other datasets are conflating as well as possible.

bwitham commented 6 years ago

Almost have 200 matches on the first dataset (bay area CA state gov and OSM). Matching has been even more tedious than I initially expected, but I am starting to speed up a little.

bwitham commented 6 years ago

Manual matching between CA state gov and OSM done. Took about 2 full days, not counting the time writing code to manipulate the data. Onto the next set.

bwitham commented 6 years ago

LA public matching done. Down to last pair of public datasets to match. Hopefully, will get some mgcp after that.

bwitham commented 6 years ago

Done for now. Will do MGCP when it becomes available.

curranMapper commented 6 years ago

Working this! Will get you update before I leave today

On Wed, Jun 20, 2018 at 10:36 AM, Brandon Witham notifications@github.com wrote:

Done for now. Will do MGCP when it becomes available.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ngageoint/hootenanny/issues/2382#issuecomment-398772770, or mute the thread https://github.com/notifications/unsubscribe-auth/APMJqDwc9P7k51DpkRrd2EvHhane0ioHks5t-l30gaJpZM4UCGfj .

bwitham commented 6 years ago

Awesome, thanks.

ngageoint / hootenanny

Manually match power line data #2382