Closed Venka97 closed 4 years ago
Here is the graph: https://github.com/mapbox/robosat/blob/b593a6b12957649767aaeb2dc60951e453a088ca/robosat/graph/core.py
It's a simple hand-rolled undirected graph, with depth-first search and connected component clustering capabilities.
Per vertex s
it is storing all outgoing edges to vertices t_n
in a set. And because we want an undirected graph, we do the same for every vertex t
for all outgoing edges to vertices s_n
. The graph semantic is super simple to implement with a defaultdict(set)
but there's quite some overhead in it.
If your benchmark results show the graph construction or traversal to take the majority of time spent here, you have two options:
If you are actively using this, I highly recommend you looking at the issues and potentially first upgrading dependencies.
I no longer actively maintain this and I am no longer with Mapbox.
Hey @Venka97 - what came out of this? Have you looked into it?
Hey @daniel-j-h, I was occupied this week with some other work. I will do it as soon as I get free and post an update.
Hey, @daniel-j-h , so I tried integrating undirected graph using something pre-made. I tried out both networkx and graph-tool and there wasn't much difference in graph construction. If anything, networkx is slightly slower than your implementation.
I am yet to look into more efficient graph representations.
@daniel-j-h, I think have managed to solve the issue. The graph building was taking a lot of time but it didn't have anything to do with it.
From pyproj documentation, given that we use same transformations, i.e. epsg:4346 -> equal area and back, we can pre-initialize it rather than passing the CRS as args in buffer and unbuffer methods.
Doing this gave me a massive improvement, for a sample geojson before it took me ~23 minutes for graph building loop. After this, it took ~30 seconds.
Cool! Can you open a pull request with this change? 🤗
On August 12, 2020 10:00:18 AM UTC, Venkatesh notifications@github.com wrote:
@daniel-j-h, I have managed to solve the issue. The graph building was taking a lot of time but it didn't have anything to do with it.
From pyproj documentation, given that we use same transformations, i.e. epsg:4346 -> equal area and back, we can pre-initialize it rather than passing the CRS as args in buffer and unbuffer methods.
Doing this gave me a massive improvement, for a sample geojson before it took me ~23 minutes for graph building loop. After this, it took ~30 seconds.
Sure, I'll do that!
Opened #209
rs merge is currently very slow for files with a large number of polygons (>20k). The merging part can be parallelized but the graph building takes very long.
I will benchmark the results of graph construction, traversal etc. and post the results.
Open for a discussion to improve this, if anyone has faced a similar issue, please feel free to contribute!