Optimize weights computation

I have one more thing to add to this PR before committing (change routine plotting to re-use plots, also a speed improvement) but I'll go ahead and open it up now for feedback.

Recasts the algorithm used in make_weights to compute weights everywhere using a few big array operations, rather than many small array operations inside a for-loop. Takes advantage of the AVX architecture in numpy to do many operations at once under the hood inside the CPU, which I expect is more optimal than any algorithm we could've gotten from scratch.

Had to introduce a few helper functions to get the process worked out and clean, so here's the breakdown of how those functions are used:

make_weight performs the same steps as before, except instead of being performed locally around some index (i,j), it performs those steps globally on a 3D ndarray in which information on all of the neighbors at each index has been stored along the third axis (exactly like the previous Particles.weights array)
big_sliding_window is a helper function to map all local neighbors into a 3D array. The outcome of this function matches what we would get from (a) looping over all indices (i,j), (b) grabbing the local D8 neighbors with raster[i-1:i+2,j-1:j+2], then (c) applying np.ravel() to that local array, which we store along the third axis of a new array. big_sliding_window just does that without looping, by mapping sub-arrays using the same order as ravel, i.e. [NW, N, NE, W, 0, E, SW, S, SE]
tile_local_array is a helper function that maps a single local D8 raster into a 3D array in which the elements are repeated L,W times. This is used to get arrays like ivec or distances, which are the same everywhere, into the same space as the weights array.
tile_domain_array is a helper function which works like tile_local_array, except it simply repeats a big 2D array 9 times along the vertical. When used on a raster like stage, the output of this function and the output of big_sliding_window directly match up the local values in the raster with the neighboring values.
Lastly, clear_borders is just a useful little function to set all the 2D boundaries stored in the 3D array to 0. This is to eliminate any values that show up in the weights array along the border during the math inside make_weight.

In the test cases I've done so far, computing the weights never takes more than a few seconds. On the largest domain I could think to use (~2900 x 2000 elements), it took about 10 seconds.

For a test application, iterating 2000 particles for 10 hours in a domain (~1100 x 1100 elements) with no plotting, the previous algorithm took 4.6 minutes to finish, whereas this algorithm took almost exactly 3 minutes with similar-looking results. From what I remember of previous times the package was profiled, this 34% speedup is about how much time we were spending inside make_weight, which suggests to me that this function will be our problem no more.

Will also work on adding and editing unit tests for these functions.

I have a couple general questions about the four unit tests that were previously failing (the four newest weight-related tests added to test_lagrangian_walker, i.e. test_make_weight_shallow and thereafter). The reason for these failings actually had nothing to do with the array-weight methodology, but instead one throw-away line that in retrospect was a change to the method:

weight[:,:,4] = 0.

Which, if included, would enforce that particles sitting at some index cannot stand still after an iteration (i.e. the weight for the cell they are in, index 4, is zero). I was under the impression that this was already the case, but it appears that we not only allow it, but these tests are designed to ensure that this happens in certain circumstances. Is this the behavior we actually want?

In all of these test cases, the way this happens inside make_weight is identical. The stage and discharge arrays are zero, and depths are zero almost everywhere except the cell we're in, which means that the weight computation results in zero preference for being anywhere (no surface or velocity gradient). We then enter the failsafe condition (lines 211-215) that sets the weight to 1.0 anywhere nearby with the maximum depth -- which in all of these cases is the cell we are in, even if there are non-zero depths nearby.

I can see how this might be preferable in the case described in test_make_weight_shallow, where none of the surrounding cells have any water. But in test_make_weight_equal_opportunity, where the current and two neighboring indices have the same max depth, I feel like we should prefer traveling to the two neighboring cells over standing still. Likewise, in test_make_weight_unequal_opportunity, because the two neighboring cells have less than the local max depth, the chance of us traveling to those cells over standing still is zero (there is no surface or velocity gradient, and they do not have the local max depth), which I'm not sure is ideal. Lastly, all of these tests check that the weights along the boundary are zero, but they don't check if the weight of traveling to the boundary is zero, merely the weight of staying at the boundary once there (which I think we ruled out as a possibility elsewhere?).

I actually think we can fix this quite easily by switching the order we do things in steep_descent() we should remove the initial location from consideration first, and then identify the highest remaining weights, and I think your concern would be addressed.

As a quick clarification, in test_make_weight_unequal_opportunity(), the two nearby cells with non-zero depth have a weight of zero, similar to the surrounding cells with no water. This is because there's no stage or velocity gradient leading there, and those cells don't have the local max depth (and therefore aren't caught by the failsafe). So if we do want to route to those cells in a circumstance like that, it needs to happen upstream of steep_descent() somewhere in make_weight, or else the weighting array shows no other viable cells for travel. However, cases where the stage and velocity gradients are zero may be exactly when we'd need the particle to look for the deepest cell, so perhaps the problem is a bit ill-posed.

As a quick clarification, in test_make_weight_unequal_opportunity(), the two nearby cells with non-zero depth have a weight of zero, similar to the surrounding cells with no water. This is because there's no stage or velocity gradient leading there, and those cells don't have the local max depth (and therefore aren't caught by the failsafe). So if we do want to route to those cells in a circumstance like that, it needs to happen upstream of steep_descent() somewhere in make_weight, or else the weighting array shows no other viable cells for travel.

Ah yes good point. I think I agree with what you say below:

However, cases where the stage and velocity gradients are zero may be exactly when we'd need the particle to look for the deepest cell, so perhaps the problem is a bit ill-posed.

There is the argument to be made that in the absence of a gradient we'd want the particle to sit still since it is already in the "deepest" cell in the neighborhood.

passaH2O / dorado

Optimize weights computation #27