Using shapefile-generated graphs with GOSTNets?

rbanick commented 4 years ago

Hi all,

Is there guidance and/or examples for how to interact graph objects created from shapefiles with GOSTNets?

My current problem is an inability to use shapefile-generated graphs with GOSTNets. The back story is that after preparing some OSM data the recommended way (Tutorials 1 - 3), it became apparent a significant chunk of roads were disconnected in OSM and needed to be manually joined up. Re-creating a networkx graph object from this was straightforward (I got a <networkx.classes.digraph.DiGraph object) but using GOSTNets functions on the graph returns errors.

The most basic function I need is to generate a pickle from the graph, to enable further analysis. Running gn.save(my_new_graph,'my_new_graph','./files/') yields the following error

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-8-e413f8aac5fb> in <module>
----> 1 gn.save(my_new_graph,'my_new_graph','../files/')

~/git/GOST_PublicGoods/GOSTNets/GOSTNets/GOSTnet.py in save(G, savename, wpath, pickle, edges, nodes)
   1037     """
   1038     if nodes == True:
-> 1039         new_node_gdf = node_gdf_from_graph(G)
   1040         new_node_gdf.to_csv(os.path.join(wpath, '%s_nodes.csv' % savename))
   1041     if edges == True:

~/git/GOST_PublicGoods/GOSTNets/GOSTNets/GOSTnet.py in node_gdf_from_graph(G, crs, attr_list, geometry_tag, xCol, yCol)
    211                 pass
    212 
--> 213         nodes.append(new_column_info)
    214         z += 1
    215 

UnboundLocalError: local variable 'new_column_info' referenced before assignment

This new_column_info error is repeated when running the CleanNetwork function defined in the Tutorial Step 2 notebook.

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-10-76c9fa3ca80f> in <module>
     19     G = my_new_graph # inserting CXB edited graph object instead
     20 
---> 21     G = CleanNetwork(G, wpath, country, UTM, WGS, 0.5, verbose = False)
     22     print('\nend: %s' % time.ctime())
     23     print('\n--- processing complete for: %s ---' % country)

<ipython-input-2-85460a2dea5d> in CleanNetwork(G, wpath, country, UTM, WGS, junctdist, verbose)
     13 
     14     # Squeezes clusters of nodes down to a single node if they are within the snapping tolerance
---> 15     a = gn.simplify_junctions(G, UTM, WGS, junctdist)
     16 
     17     # ensures all streets are two-way

~/git/GOST_PublicGoods/GOSTNets/GOSTNets/GOSTnet.py in simplify_junctions(G, measure_crs, in_crs, thresh)
   1141     G2 = G.copy()
   1142 
-> 1143     gdfnodes = node_gdf_from_graph(G2)
   1144     gdfnodes_proj_buffer = gdfnodes.to_crs(measure_crs)
   1145     gdfnodes_proj_buffer = gdfnodes_proj_buffer.buffer(thresh)

~/git/GOST_PublicGoods/GOSTNets/GOSTNets/GOSTnet.py in node_gdf_from_graph(G, crs, attr_list, geometry_tag, xCol, yCol)
    211                 pass
    212 
--> 213         nodes.append(new_column_info)
    214         z += 1
    215 

UnboundLocalError: local variable 'new_column_info' referenced before assignment

Digging further into the node_gdf_from_graph function by running `edge_gdf = gn.edge_gdf_from_graph(my_new_graph) yields

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-23-2b59cc5ed218> in <module>
----> 1 edge_gdf = gn.edge_gdf_from_graph(cxbr)

~/git/GOST_PublicGoods/GOSTNets/GOSTNets/GOSTnet.py in edge_gdf_from_graph(G, crs, attr_list, geometry_tag, xCol, yCol)
    265             # if it doesn't have a geometry attribute, the edge is a straight
    266             # line from node to node
--> 267             x1 = G.nodes[u][xCol]
    268             y1 = G.nodes[u][yCol]
    269             x2 = G.nodes[v][xCol]

KeyError: 'x'

I am very much new to graph objects so apologies if this is a simple fix. Any suggestions or help would be welcome.

bpstewar commented 4 years ago

https://github.com/worldbank/GOST_PublicGoods/tree/master/Implementations/RobertBanick_ShapefileTToG

I have come up with a sample in this folder, but I haven't tested it much. It works becuase your shapefile has from and to nodes defined, which won't necessarily work for all shapefiles

rbanick commented 4 years ago

@bpstewar OK I'll give it a go and report back any problems here. Would a quick fix for future shapefiles be to define from/to nodes?

rbanick commented 4 years ago

@bpstewar The shared notebook code succeeds in loading the graph object, manipulating it, and saving it down as a CSV or SHP. Unfortunately it fails when I try to generate a pickle from the graph object and thus doesn't solve my core problem of generating a GOSTNet Analysis-ready file.

A minimal example of gn.save(G,'CXB_example','./') using the data shared in the Implementation folder produces the same new_column_info error:

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-9-47af554e8aee> in <module>
----> 1 gn.save(G,'CXB_example','./')

~/git/GOSTNets/GOSTnets/core.py in save(G, savename, wpath, pickle, edges, nodes)
   1068 
   1069     if nodes == True:
-> 1070         new_node_gdf = node_gdf_from_graph(G)
   1071         new_node_gdf.to_csv(os.path.join(wpath, '%s_nodes.csv' % savename))
   1072     if edges == True:

~/git/GOSTNets/GOSTnets/core.py in node_gdf_from_graph(G, crs, attr_list, geometry_tag, xCol, yCol)
    206                 pass
    207 
--> 208         nodes.append(new_column_info)
    209         z += 1
    210 

UnboundLocalError: local variable 'new_column_info' referenced before assignment

bpstewar commented 4 years ago

Yeah, I was a bit worried about that - the generated nodes in the G object don't have any attributes (including coordinates), so they don't work with GOSTNets. I have another idea that I can play with.

What are your origins and destinations for this analysis? Can I test them?

rbanick commented 4 years ago

Files sent over email

rbanick commented 4 years ago

Thanks for the back-and-forth over email. I started replying there, then figured this follow-up was best filed here for better public awareness.

I went through the full GOSTNets routine last week and was able to generate useful, realistic-looking results using the shapefile input data.

However I encountered two possibly related problems that impacted the results. Could you suggest possible causes and/or fixes?

1 Running the manually modified shapefile object back through the largest graph processing routine does not consistently remove disconnected road lines, as before. It seems to remove little extraneous bits of roads, like the blue lines below, but not larger disconnected road objects, like the disconnected set of roads in the center.

This means that around ~0.5% of my origin nodes joined to road objects that were disconnected from the larger network. They were thus erroneously marked as unable to drive and reported inflated, walking-only travel times.

Would you have any idea why this might happen and how I could correct it?

One note on this: because of some environment setup glitches I seem to be running a slightly newer version of networkx (2.4) than GOSTNets expects. The only place this has caused me problems is with the following line of code, which references a no longer extant nx.strongly_connected_component_subgraphs function:

list_of_graphs = list(nx.strongly_connected_component_subgraphs(G))

I had to replace it with:

list_of_graphs = list(G.subgraph(c).copy() for c in nx.strongly_connected_components(G)) # old code was deprecated, new suggested fix. Must use copy() for gn.save below to work

I can't think of a reason why this would cause the problem but am disclosing in case you spot something I didn't.

2 There are a few origin nodes where the nearest plausible road was attached to the larger network, yet they were still marked as unable to drive. See the 3 blue origin dots on the lefthand side of the above screenshot. Any ideas why this might happen? This happened with only a handful of nodes so if necessary I can replace their travel time values with neighbors' manually.

Charlesfox1 commented 4 years ago

Hi Robert,

I am The Architect. Some believe I exist only as a ghost in the machine, a collection of loose code snippets and Easter Eggs, buried deep within GOSTnets where even LordBen dares not tread. Other believe I was once at the Bank long ago, in a sort of 'golden age'. Perhaps they are both right.

I can shed some light on 1.) above, but would have to reproduce the error in 2.) locally to see what's going on.

1.) is a design choice. Large subgraphs were left in by design, as countries which had islands (e.g. the UK with Northern Ireland) might require separate accessibility analyses to be run on each subgraph in turn. As such, it is incumbent on the user to determine whether they want to use a graph with two or more subgraphs, or just one mega graph (by removing or artificially joining the networks). I see you have replaced some deprecated code; have you set your graph object to the largest connected component subgraph? You may have just generated the list of subgraphs, without actually assigning the largest graph to the G / other variable you are working with. In order to be guaranteed that each node can physically reach all other nodes, you must have only one strongly connected component subgraph.

2.) is anyone's guess but my hunch is sorting 1.) may sort 2.) ? Who knows.

d3netxer commented 4 years ago

@rbanick In regards to # 1 I have processed the largest subgraph and I am attaching the output here. Is there a way to tell if this output is right or wrong? I'm not sure where is the location where you have your screenshot.

Also, are you working off of the GNET_ESAPV_BGD_D_B_Shapefile_to_G notebook in this repo or another notebook?

G_largest_graph_edges.csv.zip

for # 2 Your use-case is a good justification for us to build a more robust way to import shapefiles and convert them to graphs. However, I'm not sure if the notebook (GNET_ESAPV_BGD_D_B_Shapefile_to_G) has done this perfectly. For example, I see a DiGraph, where as usually GOSTNets works with MultiDiGraphs, and also has a clean function that reflects edges in both directions. There may also be other important post-processing routines that are not being applied. It is interesting that I was still getting results for all of the values of the OD matrix, but I think that this shapefile import process should be looked at more closely. If you are facing a deadline, for now I strongly suggest you make edits on the OSM website, and then re-download the corrected pbf, and start your analysis from there.

rbanick commented 4 years ago

Hi all,

After digging around in the code I found the missing code to export only the largest graph. Long story short because I cobbled together my routine from some of the more complicated Implementations (which I'm building up to), and then Tom's shapefile import code, the critical routine went missing. My bad.

I'll report back once I'm done processing if the snapping behavior seen in problem #2 is fixed now. The "removing extraneous bits of the network" cleaning behavior is still happening, but again it seems to only remove non-critical elements.

@Charlesfox1 Good background and makes sense. Is there guidance or an illustrative implementation somewhere on how to use multiple subgraphs concurrently? This is not a need for my current use case but it might be for my next.

@d3netxer I started off adapting the Sierra Leone implementation because I wanted to eventually employ a process that could include disruptions + IRI. I've pulled elements from the GNET_ESAPV_BGD_D_B_Shapefile_to_G notebooks shared by Ben into what is by now a semi-custom stepwise series of notebooks. Perhaps not the best but I've been more focused on understanding the process and meeting my deadline than doing things elegantly. I figured the training will deliver the elegance :-)

@d3netxer As a user I would definitely recommend investing in a robust shapefile import process. In my experience OSM has often worked best in complement to the main national road networks for accessibility modeling, though this was of course context dependent. For instance in Bangladesh I am told that this is the case; I'm only comfortable using standalone OSM in Cox's Bazaar because of all the humanitarian mapping.

Manually fixing shapefiles seems to work now but agreed that OSM fixes are more sustainable.

rbanick commented 4 years ago

Hi all,

I spoke too soon; I am not, in fact, able to correctly export the largest graph from a processed shapefile.

I am loading the below shapefile into GOSTNets and processing using the code at the bottom of this message.

cxb_and_neighbors.zip

I think I've found the problem in the shapefile but my knowledge of how networkx handles topology is not sufficient to fix it. See description below.

There appears to be a problem with a critical node segment at node 19914: the road segment south of it is listed as starting at node 15131 despite visually starting at the same place. Stated plainly, nodes 19914 and 15131 are stacked on top of each other instead of being the same.

This was the segment I had to manually connect, which I did in QGIS with the Vertex tool and a Merge operation. Therefore I'm wondering if this is a simple geometry fix and I'm not understanding something about the topology involved?

Processing code (yes it's messy):

# the nodes in the dataset do not have coordinates, so let's fix that
edges = list(G.edges(data=True))
nodes = G.nodes(data=True)
all_nodes = []
# Loop through all the nodes to extract their coordinates from the nodes
for n in nodes:
    # For the current node, loop through the edges until we find an edge with the current node
    found_node = False
    edge_count = 0
    while not found_node:
        e = edges[edge_count]
        edge_count = edge_count + 1        
        # if the current node is part of the current edge, we can extract the coordinate        
        if n[0] in e: 
            found_node = True
            # The coordinate for the node is either the first or final coordinate of the current edge            
            pt_idx = 0
            if e.index(n[0]) == 1:
                pt_idx = -1
            #Extract the appropriate point and store the new node
            pt = list(e[2]['geometry'].coords)[pt_idx]
            node_vals = {'x':pt[0], 'y':pt[1]}
            all_nodes.append([n[0], node_vals])
            #G.remove_node(n[0])
            #G.add_node(n[0], **node_vals)
G.update(nodes=all_nodes)

then

# inspect the resulting Graph
nodes = list(G.nodes(data=True))
edges = list(G.edges(data=True))
print(len(nodes))
print(nodes[0])
print(len(edges))
print(edges[0])

and then

# Identify only the largest graph
list_of_subgraphs = list(G.subgraph(c).copy() for c in nx.strongly_connected_components(G))
max_graph = None
max_edges = 0
for i in list_of_subgraphs:
    if i.number_of_edges() > max_edges:
        max_edges = i.number_of_edges()
        max_graph = i

largest_G = max_graph
# inspect the resulting Graph
nodes = list(largest_G.nodes(data=True))
edges = list(largest_G.edges(data=True))
print(len(nodes))
print(nodes[0])
print(len(edges))
print(edges[0])

# export largest and total pickle
gn.save(largest_G,"cxb_largestG","../intermediate/")
# gn.save(G,"cxb_all_osm_edits_G","../intermediate/")

worldbank / GOSTnets

Using shapefile-generated graphs with GOSTNets? #8