SMOOTH-simulations are slow

timroepcke commented 4 years ago

[ ] To be evaluated: "there should be a way to only update specific components and values in each timestep instead of building an entire new oemof model."
[ ] Check effect of mixed-Integer/ piece-wise defined functions
[ ] Check speed up gurobi over cbc optimization
[ ] Identify other time consuming operations

stefansc1 commented 4 years ago

I timed the different sections of run_smooth. I used the standard example model with electrolyzer power set to 1e6, so solving the model was always possible. I increased the n_intervals from 1 to 1000 in powers of 10. All timing measurements were done on my laptop. I am using the standard oemof v0.3.3. Here's what I found:

smooth_n_time

init

Everything under INITIALIZATION (converting from list to dict, create component objects etc.) - this does not change in relation to n_intervals, as it is only called once.

sim_init

Covers the first steps of each interval, from process printing (disabled for testing) to component preparation. The time increases in a linear fashion, but is negligible overall.

sim_create_model

The first part of the header RUN THE SIMULATION: creating a model_to_solve, updating bus constraints and writing the model to disk. It seems to grow faster than linear. What actually happens is that the LP file is written to disk during the first interval, throwing off the curve. For reasonable n_interval values between 10 and 100, it is negligible. Therefore, creating a model once and reusing it MIGHT reduce the runtime slightly, but is probably not worth the effort.

sim_solve

Solving the model using the cbc solver. It also seems to grow faster than linear with n_intervals. Not sure why, intervals should be independent. My guess would be that there is some initialization, which offsets the time for low numbers of intervals. With larger numbers, it grows linearly. Not much we can do here, except use another solver. Makes up between 10 and 20% of the total runtime.

sim_get_res

The first step during HANDLE RESULTS, where (oemof's) processing.results is called. This is the lion's share, making up more than half of the runtime for longer computations. I poked around in oemof's source code and identified three main sections in the function. There is some initialization, like creating a panda's dataframe from the given model and creating a dictionary of dataframes from that. Both commands take about 10 microseconds each. During the next section, the dictionary is traversed. In total, this takes 50 microseconds (for my simple model). The last part is adding dual variables for bus constraints, which seems not to be set for my model. No time was spent here. So, the big part is iterating over the dictionary. The main time sinks are pivoting the data frame dictionary and setting sequences. I am puzzled by both things for different reasons:

why does the dataframe have to be pivoted? Why can't it be saved in the right order from the beginning? This alone makes up nearly half of the runtime of processing.results
sequences are accessed from the dataframe by inverting a condition (should not be overly costly). What puzzles me is that scalars are selected the same way, but then missing scalar values are dropped and explicit integer indexing is set. This operation takes almost no time! Does this really makes such a difference? Are sequences memcopy'd, causing this slowness? Alas, not much we can do here without REALLY messing with oemof.

sim_handle_results

Contains processing.parameter_as_dict, processing.create_dataframe and looping through the components to call their result handling function. Rising in a strict linear fashion, this part becomes the second largest contributor to the run-time at about 30 n_intervals. It consists of three parts: processing.parameter_as_dict (50% runtime), processing.create_dataframe (25% runtime) and looping over components (25% runtime)

processing.parameter_as_dict calls a rather large function named __separate_attrs twice, namely to get flow data and node data. My idea here would be to call this function only once and return a tuple. However, this would mean messing with oemof's source code.
processing.create_dataframe is used by processing.results (from sim_get_res) as well, maybe we can reuse that dataframe instead of creating a new one?
looping over components is obviously dependent on the components used. Writing good update functions for your components is key here.

(In run_smooth, there is a last part where generate_results is called for each component, but virtually no time was spent there).

To drive the point home that we should focus on processing.result and how we handle these results, here is a comparison of the timings for n_intervals = 100: smooth_dist_100

While we might not be able to change the functions themselves, maybe we can improve how we use them. For me, calling process.result took about 80 to 90 milliseconds. For n_intervals = 100, this added up to 8 seconds from the total of 14 seconds. Would it be possible to call this function only once, by saving the intermediate results? I mean, during the runs only the values change, not the components. Maybe there is some optimization potential right there. Similarly, we could probably use the dataframe from this function instead of calling processing.create_dataframe a second time during sim_handle_res.

stefansc1 commented 4 years ago

Update: on closer look, the values from the two processing-functions under sim_handle_res (parameter_as_dict and create_dataframe) are only used for debugging, when the result is not optimal. So this is unnecessary work when show_debug_flag is set to False. My recommendation: at least guard the two lines where results_dict and df_results are set by a show_debug_flag check as well. Maybe even consider if this is debug info is so important to warrant a serious slowdown (about 10% of the execution time).

rl-institut / smooth