Hydro suggestions - Githubissues

This was part of my response to pull request #52

Requested changes: Please do these.

Ensure that automated testing works. hydro_system's doctests fail in run_tests.py. A simple fix is to remove that synopsis. It is now outdated and not the recommended way of loading and using a module, plus it requires adding test data to switch_mod/test_dat/. More exhaustive tests are performed by running the hydro_system example, so the doctest really isn't adding anything. I should think about stripping that out of other modules too..
Enforce_Reservoir_Balance appears to have a unit conversion bug. In simplified form, it currently is: sum(inflows) - sum(outflows) = (final_vol - init_vol) / timepoint_duration_hrs Unit analysis shows that the left side resolves to m^3/s, while the right side resolves to m^3/hr. The fix is to replace timepoint_duration_hrs with (timepoint_duration_hrs*3600).
Account for consumption at reservoirs in Enforce_Reservoir_Balance by subtracting m.res_tp_consumption
Typo: hidraulic_location -> hydraulic_location
Use infinity instead of 9999 for the default value of wc_capacity
Add minimum data checks
Remove the filtration hooks from this module (wc_is_a_filtration and the equations for FilteredFlow), and paste them into a filtration module when you implement it. This stuff was confusing on my first read, and it took me a minute to realize it was just hooks for as-yet unimplemented features.
Use shorthand for defining cross products instead of nested for loops adding to a set

mod.RESERVOIRS_BALANCE_POINTS = Set(
    dimen=2,
    initialize=lambda m: m.RESERVOIRS * m.TIMEPOINTS

Suggested changes: Do these if you agree. See second & third commits on pull request.

Replace simple constraints with lower & upper bounds for variables to reduce the number of model components.
Attempt to speed up mass-balancing at water nodes. Reduce the number of model components, and reduce the impact of nested for loops using a trick from the last code speed-up. This isn't necessary, but I wanted to play around with your code to get a feel for it, and this seemed theoretically useful.
Minor speedup to first_tps_in_period calculations
Use an underscore to prefix data that is being cached in the model, like first_tps_in_period -> _first_tps_in_period
Delete the set WATER_SINKS because it isn't used
I like your idea of merging RESERVOIRS into WATER_NODES. I added one possible implementation as the third commit on the pull request.

I further tested the new module with my large set of inputs and confirmed that the new formulation of the balance constraints is slower than the original one. Time building those constraints increased from ~0.5 s to more than 50 s. This is about the same order of magnitude increase I described in the previous comment when I tested the small hydro example. I'm attaching the profiles produce with cProfile I examined. You can open them with RunSnakeRun.

I have found out that if Pyomo needs to go through loops and if statements when building constraints, the process is really slow. It is significantly faster to create auxiliar sets and have Pyomo construct precise expressions over them. That is why my initial version had so many "extra" sets. Maybe an intermediate solution is to create these auxiliar sets and to solve the model, but delete them in a cleaning process once optimization is finished.

Other than that, the new formulation is quite nice. Instance creation time is actually slightly reduced: if the balance constaint creation is not considered, instancing time is reduced in about 5-10%. Solving time is about the same. Objective function value is exactly the same, so it passes the "test" with flying colors.

hydro_profiles.zip

After some testing, I found out that the if statements and the querying of the dictionary were the causes of the slow down. I replaced that with simple sums over inflows and outflows that are calculated for every pair of indices. Constraint building time was reduced to 0.3 s, similar to the original formulation.

Also, after playing with some simulations, the ReservoirSurplus variable actually does take values different from 0 in some cases where inflows to a reservoir are larger than what the generator downstream can use to generate power.

So, if you are ok with the change I applied in the last commit, we are ready to merge!

Looks good to me!

I didn't actually profile any of this :/ In my experience, profiling results for toy problems often don't translate to actual problems. A toy problem might have 10 timepoints instead of a 100 which can make a big difference in which parts of the code go slowly.. I usually do a rough estimate of runtime in my head by thinking through how many times each block of code will run, relative size of sets, and if there are deeply nested for loops. But actual timing always trumps theoretical estimates. Thanks for doing the profiling and getting insights into what is fast and slow. I haven't played enough with that lately to understand your intuition, but I appreciate the results!

Your code for mass balancing at water nodes is way shorter and easier to read! I wonder if caching the set of connections of a given water node for wc in m.WATER_CONNECTIONS if m.water_node_to[wc] == wn would be much faster than recomputing it for each timepoint at that node. It's clear that caching this for each timepoint and each node is counter-productive.. Anyway, probably not worth focusing on. My initial motivation for that edit was mostly about playing around and making sure I understood your code.

Regarding merging reservoirs with water nodes.. I added the slack variable ReservoirSurplus to mimic your initial formulation of an inequality for the final reservoir condition. I wouldn't expect substituting end_volume = m.final_res_vol[wn] + m.ReservoirSurplus[wn, p] with this end_volume >= m.final_res_vol[wn] to work based on what I know of python operator overloading. The first expression is copying the right hand side of the equation to the left hand side, but the second expression is producing a boolean test. Normally I leave slack variables implicit in a model and let the compiler add them automatically for each inequality. But in this case, I needed a slack variable selectively applied, so doing it manually made more sense (and didn't increase the size of the compiled problem since the compiler would add a slack variable anyway).

switch-model / switch

Hydro suggestions #61