Results processing could potentially be sped up

jokochems commented 2 years ago

Status quo / issue

The current workflow in the processing(.py) module is roughly as follows:

Results obtained from pyomo are first stored in an overall pandas.DataFrame.
Afterwards, this DataFrame is split up by components and stored in a dictionary of DataFrames for each component.
Then, there is an iteration over the dict of DataFrames and the results are stored in a common nested dictionary format with the components as keys for the outer dict and "scalars" and "sequences" as keys for the inner dict.

While I think, this is quite intuitive, it might bring along some performance issues for large models. We have noticed quite some computational effort for converting results of large simulation runs.

Idea to tackle / discuss

I think the conversion step in between is not needed and maybe, one could use vectorization in pandas or numpy instead of the dict data structures.

I have not yet thought it through completely, esp. not for other consequences this might bring along, but I think, this could be a thing to discuss in the upcoming dev meeting https://github.com/oemof/oemof/issues/96

p-snft commented 2 years ago

Could a new implementation also solve the dumping problem (#781)? Also, https://github.com/oemof/oemof.network/issues/12 might help.

uvchik commented 2 years ago

No, @p-snft that seems to be different because in #781 the user wanted to dump the EnergySystem before(!) the optimisation.

jokochems commented 2 years ago

After spending the day with the processing module, my idea would be to offer an option to group on demand, i.e. to just call the function create_dataframe(om) and then write a new function that either

does the groupby operation on demand and creates the results dicts as used for a subset of nodes specified as input arguments or
to split the overall DataFrame directly into sequences and scalars instead of doing so on a node level. I prefer this second option. But nonetheless, it probably can't become the standard rightaway because some other functions need the structure as it is now, right? So I think of it as an on top service.

I'm not sure how much this would improve the computational performance, but I think, it could be worth a shot for really large models where you are only interested in some particular results in the first place. I've had my share of experience with (creating) and retrieving dicts of DataFrames being not the most efficient solution. 😉 What do you think?

oemof / oemof-solph

Results processing could potentially be sped up #794

Status quo / issue

Idea to tackle / discuss