worldbank / GOST_PublicGoods

MIT License
40 stars 16 forks source link

Why did i write it like this in the first place?!!?!?! CHRIST #19

Open Charlesfox1 opened 4 years ago

Charlesfox1 commented 4 years ago

Ok boys

gn.pandana_snap() is still too slow. the .apply call is totally unnecessary - use geospandas to reproject, it's MUCH faster. updated function suggestion below:

def pandana_snap(G,
                 point_gdf,
                 source_crs = 'epsg:4326',
                 target_crs = 'epsg:4326'):

    in_df = point_gdf.copy()
    node_gdf = gn.node_gdf_from_graph(G)

    in_df['proj_geometry'] = in_df['geometry']
    in_df = in_df.set_geometry('proj_geometry')
    in_df = in_df.to_crs({'init':target_crs})
    in_df['x'] = in_df.proj_geometry.x
    in_df['y'] = in_df.proj_geometry.y

    node_gdf = node_gdf.to_crs({'init':target_crs})
    node_gdf['x'] = node_gdf.geometry.x
    node_gdf['y'] = node_gdf.geometry.y

    G_tree = spatial.KDTree(list(zip(node_gdf['x'],node_gdf['y'])))

    distances, indices = G_tree.query(list(zip(in_df['x'],in_df['y'])))

    in_df['NN'] = list(node_gdf['node_ID'].iloc[indices])
    in_df['NN_dist'] = distances
    in_df = in_df.drop(['x','y','proj_geometry'], axis = 1)

    return in_df
d3netxer commented 4 years ago

nice! Funny, I was looking into the same thing very recently when I noticed that @rbanick was trying to snap over 100k points to his dense road network and pandana_snap took a long time.

I was working on a new function called pandana_snap_c on the import_shapefile branch: https://github.com/worldbank/GOSTnets/blob/import_shapefile/GOSTnets/core.py#L1743

maybe you can check it out @Charlesfox1?

Originally I thought the KD tree was the slow part, so I used the C version in SciPy. Then after timing it I found out that the re-projection was the issue; and also found out that doing it in Pandas is faster : )

Also, I believe there is no need to re-project at all if the source and target projections are the same, so we can save some time there.

rbanick commented 4 years ago

I can confirm that pandana_snap is by far the slowest part of my GOSTNetting process! Performance improvements would be most welcome :-)