outputlib with pandas - Improvements

ckaldemeyer commented 8 years ago

Some changes still have to be done. I am using this issue to collect and discuss ideas.

Ideas for changes:

It should be possible to pass a result-object directly. For this, we need a way to deal with the datetime-index which normally comes from the energy system
Add a flag like "return_df". If set to True, the dataframe-object will be returned directly
Add possibility to exclude components in _plotbus() method
Add a plot method that gives a good overview about the bus balance. My idea would be a stacked area or a normal stacked stackplot. All inputs are positive and all outputs negative. Additionally, the residual load will be plotted as a line and should be zero.
Add methods to store/restore the dataframe using pickle or something equivalent
Realize dataframe creation and plotting in different classes (separate data and visualisation)

ckaldemeyer commented 8 years ago

@uvchik @simonhilpert : Add your ideas if you want

ckaldemeyer commented 8 years ago

I have added some notes..

ckaldemeyer commented 8 years ago

@uvchik: Just as an information. I have just used the class to analze results for renpassgis. For one year, the dataframe holds more than 11 million entries and it takes 20-30 minutes to generate it. But this is for all busses.

But I'll have a look if we can tweak something..

uvchik commented 8 years ago

@ckaldemeyer wrote:

1) It should be possible to pass a result-object directly. For this, we need a way to deal with the datetime-index which normally comes from the energy system

The more options we have the more cases we have to check every time we change anything. You have to pass both informations anyway, so what is the problem to create a simple EnergySystem just as a container.

my_es = EnergySystem(results=my_result_dict, time_idx=my_time_index)
my_df = EnergySystemDataFrame(energy_system=my_es)

2) Add a flag like "return_df". If set to True, the dataframe-object will be returned directly

This works already for me. The init-method cannot return anything.

my_df = tpd.EnergySystemDataFrame(energy_system=my_es)
print(es_df.data_frame)

3) Add possibility to exclude components in _plotbus() method

+1

4) Add a plot method that gives a good overview about the bus balance. My idea would be a stacked area or a normal stacked stackplot. All inputs are positive and all outputs negative. Additionally, the residual load will be plotted as a line and should be zero.

Or a table that shows the balance of the year and a function that returns the dates and time where the balance is not valid.

5) Add methods to store/restore the dataframe using pickle or something equivalent

+1 (e.g. hdf5, csv, postgres...)

6) Realize dataframe creation and plotting in different classes (separate data and visualisation)

I really like this idea. This is all the more true, as we will have more kinds of outputs like key values, summary tables, reports... +1

So far I do not have any wishes but thank you for your incitements.

ckaldemeyer commented 8 years ago

Thanks for your feedback! I'll start on the branch "features/outputlib-improvements".

My newest idea was to design the class to_pandas (better name would be nice) as a subclass of the pandas dataframe class. This way, all methods are operating on the object "self" and we have the advantage that all conventional pandas methods are also available to manipulate the object from outside.

By doing so, we could access the object like this:

my_df = tpd.EnergySystemDataFrame(energy_system=my_es)
print(my_df)
my_df.plot_bus(uid="bla")

I'll see how far I get in the next days and post my progress.

ckaldemeyer commented 8 years ago

Ideas:

~~Add possibility to exclude components in plot-methods~~ One can now pass a list of obj_uids or obj_uid-substrings that will be excluded. See newest commit for an example.
~~Add bus variables "shortage", "excess" and "duals" correctly to column "other" in dataframe~~ Done. So far, "duals" were saved wrongly in the dataframe. I fixed it now and added the slack vars. But see my question below.
Inherit from pandas.dataframe class
Realize dataframe creation and plotting in different classes (separate data and visualisation)
Add a plot method that gives a good overview about the bus balance
Add methods to store/restore the dataframe using pickle or something equivalent

There's something I stumIed upon: I don't understand why the bus variables are saved the way they are at the moment in the result object.

optimization_model.py:

    if hasattr(self, "dual"):
        for bus in getattr(self, str(Bus)).objs:
            if bus.balanced:
                result[bus] = result.get(bus, {})
                result[bus][bus] = [
                    self.dual[getattr(self, str(Bus)).balance[(bus.uid, t)]]
                    for t in self.timesteps]

    for bus in getattr(self, str(Bus)).objs:
        if bus.excess:
            result[bus] = result.get(bus, {})
            result[bus]['excess'] = [
                getattr(self, str(Bus)).excess_slack[(bus.uid, t)].value
                for t in self.timesteps]
        if bus.shortage:
            result[bus] = result.get(bus, {})
            result[bus]['shortage'] = [
                getattr(self, str(Bus)).shortage_slack[(bus.uid, t)].value
                for t in self.timesteps]

I don't see a difference between the types of the variables since they either belong to one object (bus) or are a result of multiple objects (component interaction). So they should be treated in the same way in my opinion.

@gnn @simonhilpert

Why don't we just add an entry result[bus]['duals'] instead of result[bus][bus] for the dual variables?

ckaldemeyer commented 8 years ago

Testing renpass-gis, I have just noticed that there are multiple entries for duals:

                                                               val
bus_uid    bus_type type  obj_uid                 datetime                
el: DEdr13 el       other dual                    2015-01-01 00:00:00    0
                                                  2015-01-01 00:00:00    0
                                                  2015-01-01 00:00:00    0
                                                  2015-01-01 00:00:00    0
                                                  2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                                                  2015-01-01 01:00:00    0
                                                  2015-01-01 01:00:00    0
                                                  2015-01-01 01:00:00    0
                                                  2015-01-01 01:00:00    0
                          excess                  2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          shortage                2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          storage_140081388946880 2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          storage_140081388978864 2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          storage_140081388982112 2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          storage_140081389014096 2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0
                          storage_140081389041984 2015-01-01 00:00:00    0
                                                  2015-01-01 01:00:00    0

These all seem to be saved under the same key if I get it right:

    if hasattr(self, "dual"):
        for bus in getattr(self, str(Bus)).objs:
            if bus.balanced:
                result[bus] = result.get(bus, {})
                result[bus][bus] = [
                    self.dual[getattr(self, str(Bus)).balance[(bus.uid, t)]]
                    for t in self.timesteps]

@simonhilpert : Is it somehow possible to add unique identifiers here?

simnh commented 8 years ago

you are right. Dual variables exist for every constraint inside the (linear) optimization problem. We can extract all of them but the loop only gets all duals for balanced bus constraint. The unique identifier could then be the bus-uid.

For renpass-gis I think the major interest lies on the duals from balanced-electricity busses which can be interpreted as the marginal electricity price (shadowprice) for the bus.

In the future I would actually extract all duals (if wanted) because in terms of economics as all of them can be interpreted in a certain way. But we can start with duals for balanced busses...

uvchik commented 8 years ago

@ckaldemeyer wrote:

My newest idea was to design the class to_pandas (better name would be nice) as a subclass of the pandas dataframe class. This way, all methods are operating on the object "self" and we have the advantage that all conventional pandas methods are also available to manipulate the object from outside.

So i will wait with new changes until you finished the restructure process, to avoid difficult merge conflicts.

ckaldemeyer commented 8 years ago

I have now fixed the dataframe creation and pushed my results. Somehow I still don't get the duals but everything else should work and is now a bit clearer in the creation. The result-object is such a brainf*\ ...

@simonhilpert : Could you maybe have a look at the duals-problem? We could also talk about that on Wednesday.. @uvchik : Could you have a look if everything works as expected? I have already made some tests but you never know ;)

Concerning the inheritance from pandas classes: Obviously, it is not recommended to inherit from pandas.

See:

but we could use a composition instead:

http://learnpythonthehardway.org/book/ex44.html

Concerning the split into two classes: What about these class names?

EnergySystemDataFrame (dataframe and methods for slicing)
DataFramePlot (plotting with dataframe)

What do you think?

I'll go to bed now and will be back on Wednesday ;-)

uvchik commented 8 years ago

My example seems to work, thanks :smile:

Maybe ResultsDataFrame instead of EnergySystemDataFrame but this is not a hard objection. I totally agree with splitting the class.

For me it is not important to inherit or use a composition if there are no big advantages but go ahead if you think it's better (either way).

ckaldemeyer commented 8 years ago

Maybe ResultsDataFrame instead of EnergySystemDataFrame but this is not a hard objection. I totally agree with splitting the class.

Then, I agree on ResultsDataFrame and DataFramePlot! It's a bit shorter. At the moment I am quite busy with a paper. But I'll spend my spare time to implement the changes!

ckaldemeyer commented 8 years ago

TODO:

~~Add possibility to exclude components in plot-methods~~ One can now pass a list of obj_uids or obj_uid-substrings that will be excluded. See newest commit for an example.
~~Add bus variables "shortage", "excess" and "duals" correctly to column "other" in dataframe~~ Done. So far, "duals" were saved wrongly in the dataframe. I fixed it now and added the slack vars. But see my question below.
Realize dataframe creation and plotting in different classes (separate data and visualisation)
Add a plot method that gives a good overview about the bus balance
Add methods to store/restore the dataframe using pickle or something equivalent
Add component input/output type or component input-/outbus-bus_uid in dataframe to allow technology grouping

eosram commented 8 years ago

@ckaldemeyer: I edited the create func to improve its performance in features/outputlib-performance. Let me know what you think : )

simnh commented 8 years ago

Is this going to be finished and merged until the end of week or should we set the milestone to february?

eosram commented 8 years ago

No, it`s likely not finished by the end of the week, we should set the milestone to february.

ckaldemeyer commented 8 years ago

@s3pp : As I already said. Nice improvement!

I think we should at least have the improved dataframe creation in the next release. It's only an "internal" change and the plotting, etc. is already on the dev-Branch anyway.

The version with two classes can then become part of the next release or we already push things until Friday ;)

ckaldemeyer commented 8 years ago

So we could merge from "outputlib-performance..." into "outputlib-improvements" and then back into "dev".

uvchik commented 8 years ago

I need to do some changes for my plots but I think it make no sense to do anything until you split the classes. Do you have a plan when you are going to do it?

ckaldemeyer commented 8 years ago

At the moment I got stuck with my paper and Martin is busy with renpass-gis. I can tell you more on Monday!

eosram commented 8 years ago

I recently divided the code into two classes: A ResultsDataFrame and a DataFramePlot. Its far from finished unfortunately, because I have some trouble with renpass-gis and not enough time, but one can see the concept behind. The code is on features/fix-outpultlib-dev. Im still working on it, because the stackplot does not work properly atm.

uvchik commented 8 years ago

For me all plots work. Thank you. I activated the plots in the storage_invest example and everything works fine. I pushed it. Revert it if it doesn't work for you.

uvchik commented 8 years ago

If you are done, I will revise the plot classes according to the protocol of the web conference.

eosram commented 8 years ago

Huch, thats nice :smile:. You`re absolutely right. Its working as expected now. If you want to, you can already start to revise the plot methods. I was thinking to maybe implement other methods in ResultsDataFrame, but not in DataFramePlot.

uvchik commented 8 years ago

I think we can do that in the same branch.

ckaldemeyer commented 8 years ago

Go ahead ;-)

uvchik commented 8 years ago

The first proposal is finished :smile: I left the old stuff to compare the changes but everything below line 291 will be removed. I adapted the storage_invest example to show how the new plotting class works compared to the old one. On the first view there is more code to write but it is very easy code and it makes it much more flexible without handling hundreds of parameters.

The heart of the class is the slice_unstacked method which creates a ready to plot unstacked subset. With this subset you can just use the pandas plot method.

myplot = to_pandas.DataFramePlot(energy_system=energysystem)
subset = myplot.slice_unstacked(bus_uid="bel", type="input").subset
subset.plot(**pandas_plotting_arguments)

But the class offers you some helper methods to make plotting easier. Some things really work good and it took me and others some time to find the solution. So I wanted to keep them. So I created a plot method which only passes all arguments to pandas. So the following code does exactly the same than the code above.

myplot = to_pandas.DataFramePlot(energy_system=energysystem)
myplot.slice_unstacked(bus_uid="bel", type="input")
myplot.plot(**pandas_plotting_arguments)

If you use the second code block you have the opportunity to use the helper methods. But you still will be able to do all that matplotlib stuff.

myplot = to_pandas.DataFramePlot(energy_system=energysystem)
myplot.slice_unstacked(bus_uid="bel", type="input")
myplot.plot(**pandas_plotting_arguments)
myplot.ax.legend(loc='upper right')  
myplot.ax.set_ylabel('Power in MW') 
myplot.ax.set_xlabel('Date')  
myplot.set_datetime_ticks()  # our helper method to make the ticks nicer

With another helper method the legend can be placed outside the plot.

Now it works more like a framework because the helper methods can be used for every plot and we do not have to create overcrowded plotting methods with hundreds of parameters.

Anyway I created an io_plot that replaces the old stackplot method. I did it because the pandas plot has a problem when plotting a stacked line plot, so I had to correct it and this is half of the code Otherwise I would have used the plot method to do it instead of creating a new method.

@ckaldemeyer , @s3pp : What do you think?

eosram commented 8 years ago

I just discussed this topic with @ckaldemeyer . We both think it's very well done and overall a strong improvement. The readability of the storage invest plot creation is also very nice imo. One point could be questionable though, whether the slice_unstacked method would better fit in the ResultsDataFrame class, because it`s single purpose is data handling in the end. Or are there any disadvantages by doing so?

uvchik commented 8 years ago

One point could be questionable though, whether the slice_unstacked method would better fit in the ResultsDataFrame class, because it`s single purpose is data handling in the end.

I totally agree. I had the same idea but did not want to write code in the EnergySystemDataFrame class because I did not know how you still want to restructure this class. So feel free to move it into this class.

eosram commented 8 years ago

Ok. so I moved the method to the ResutlsDataFrame class. Imo we could start thinking about merging it back, don't we, @ckaldemeyer, @uvchik ? I thought that it might be beneficial to include methods for aggregating data on specific levels, but that's not so much restructuring than extending and eventually it isn't even necessary, because of pandas' easy-to-use functions. I will find out while working with the results of renpassg!s.

ckaldemeyer commented 8 years ago

Ok. so I moved the method to the ResutlsDataFrame class. Imo we could start thinking about merging it back, don't we, @ckaldemeyer, @uvchik ?

:+1: for me. This would also make #80 obsolete since the fix is already contained.

uvchik commented 8 years ago

I do not know how it was intended but the ResultsDataFrame class does not have a subset attribute so far. After @s3pp changes the slice_unstacked method created one. But it is never used. So I removed it and returned the subset (return subset) instead of setting the internal attribute.

As I need the internal attribute in the plotting class I inherit the method and extended it by setting the internal attribute. If you decide to have a subset attribute in the ResultsDataFrame class we can remove the method in the plotting class.

I fixed pep8, added docstrings and removed the old stuff. For me we a ready to merge.

@ckaldemeyer : You are assigned. May you do it?

eosram commented 8 years ago

I see, I see. Actually that was something I wasn't quite sure about.

uvchik commented 8 years ago

We do not have to decide that now. For now everything works fine and that is worth a merge :smile:

ckaldemeyer commented 8 years ago

:+1:

@uvchik: Concerning the merge would you mind if one of you merges? I am really busy atm and do not have the time to fix errors which eventually might occur (even if I think there won't be any)..

uvchik commented 8 years ago

No problem, but not today. I will do it tomorrow afternoon.

ckaldemeyer commented 8 years ago

Thanks!!

oemof / oemof-solph

outputlib with pandas - Improvements #54