Closed bwitham closed 6 years ago
Proposed regression test conflate combos (mgcp pending):
bay area
mgcp/ca state
mgcp/osm
ca state/osm - DONE - 226 matches
los angeles
mgcp/ca state
mgcp/osm
ca state/osm - DONE - 200 matches
mozambique
mgcp/.info
mgcp/osm
.info/osm - DONE - 61 matches
namibia
mgcp/.info
mgcp/osm
.info/osm - DONE - 200 matches
If I end up getting all the MGCP, then I may drop the third list item in each set if it hasn't been matched yet. Also, for bay area and la, possibly will add in EIA later.
Made some matches today between the public datasets as I'm waiting on MGCP. As expected, its fairly challenging matching due to the different ways power lines are mapped in the different datasets. In some cases, lines get mapped as a single line and an attribute is added to indicate there are multiple cables associated with that single line. In other cases, it seems each cable may be drawn individually. Sometimes I can verify the ground truth from imagery and sometimes I can't, as the lines are hard to make out. Seeing the lines connected to towers does help to a degree.
As lines come into power substations, the complexity increases rapidly and it seems in some cases, different datasets are largely at disagreement with each other.
One thing that's very key to making these matches is having the voltage attribution on the data, which prevents mismatching when lines of different voltage types are very close to each other. Luckily, most of the data I've seen so far has voltage attribution.
https://github.com/ngageoint/hootenanny/wiki/Power-Line-Notes
Decided to skip matching the EIA data to anything for now. The EIA is a little more detailed than the other datasets in that it seems to be mapping cables on the same lines separately. I'm not really sure the best way to handle that data yet. Technically, it has more information available since it breaks details out per cable, but that kind of mapping is a little different than how OSM and other seem to be handling things, not to mention its going to be harder to conflate. I may come back to EIA after the other datasets are conflating as well as possible.
Almost have 200 matches on the first dataset (bay area CA state gov and OSM). Matching has been even more tedious than I initially expected, but I am starting to speed up a little.
Manual matching between CA state gov and OSM done. Took about 2 full days, not counting the time writing code to manipulate the data. Onto the next set.
LA public matching done. Down to last pair of public datasets to match. Hopefully, will get some mgcp after that.
Done for now. Will do MGCP when it becomes available.
Working this! Will get you update before I leave today
On Wed, Jun 20, 2018 at 10:36 AM, Brandon Witham notifications@github.com wrote:
Done for now. Will do MGCP when it becomes available.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ngageoint/hootenanny/issues/2382#issuecomment-398772770, or mute the thread https://github.com/notifications/unsubscribe-auth/APMJqDwc9P7k51DpkRrd2EvHhane0ioHks5t-l30gaJpZM4UCGfj .
Awesome, thanks.
Ideally half a dozen datasets, 200 matches each