jkibele / pyriv

Python library for analysis of minimum aquatic distance across rivers and coasts. It's like Google Maps for anadromous fish.
MIT License
1 stars 1 forks source link

Parallel and/or Simplify for Land Graph #8

Closed jkibele closed 6 years ago

jkibele commented 7 years ago

Without parallel processing, it took 2 hours and 20 minutes to generate a graph for ocean distances around Nunivak Island. Admittedly, my computer went to sleep during part of that, but it's way to slow regardless. Especially when you consider that that one island is a tiny fraction of the nodes that will need to be used.

We need to consider two options for speeding things up:

  1. Use parallel processing for checking whether potential edges cross the land. This is done in a for loop, so it should be dead simple to implement. I just haven't done it before so it'll take me a bit to figure it out.
  2. Simplify the land graph. This would reduce the number of nodes we need to calculate for. However, we could run into problems if the simplification drops nodes that represent river mouths. This might make those river mouths end up on land. ...and will likely make them not line up perfectly with the actual river mouths. The difference would be negligible in terms of distance, but it would introduce problems with finding paths out of rivers. Possible solutions:

    1. Just don't simplify. This is what I'm leaning toward at the moment. It may take many hours to generate a graph, but we should be able to save the graph and re-use it so I'm not that bothered. ...and if we get it running parallel and do the processing on Aurora, it shouldn't take too long.
    2. Simplify a lot, but add the river mouth nodes back in before calculating the edges. The river mouth nodes are a relatively small subset of the full set of coastline points, so this would probably speed things up considerably.
jkibele commented 7 years ago

Here's one little corner of the land network:

image

That's a lot of edges.

jkibele commented 7 years ago

Multiprocessing was a serious pain in the ass, but it's working now. On a single processor, it took over two hours. On my MacBook pro running on six processors, it took 16 minutes. Running on Aurora with 15 processors, it took 2 minutes. Yay!