oemof / oemof-solph

A model generator for energy system modelling and optimisation (LP/MILP).
https://oemof.org
MIT License
297 stars 125 forks source link

outputlib - basic plots #36

Closed ckaldemeyer closed 8 years ago

ckaldemeyer commented 8 years ago

Uwe has already started with his outputlib and created a method which creates a dataframe with all component timeseries arround a given bus.

He started with basic matplotlib which has all configuration options but in my opinion sometimes to many for standard plots as it is overwhelming...

After some trying my idea would be to go with the pandas basic plotting functions (based on matplotlib) for the basic plotting functions and the plots for renpass-gis.

Here's a small example with a handfull of plots and some possible configuration options beyond the standard functionalities (needs matplotlib >= 1.4):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from datetime import datetime as dt
mpl.style.use('ggplot')

# Generate sample data
sample_data = np.random.rand(24*365, 5)
df = pd.DataFrame(sample_data,
                  index=pd.date_range('1/1/2015 00:00',
                                      periods=len(sample_data), freq='H'))

# Select date range to plot
date_from = dt(2015, 12, 22, 0, 0)
date_to = dt(2015, 12, 22, 23, 0)
df = df.loc[date_from:date_to]

# Plotting
# Formatting-tuple (title, colormap, xlabel, ...)
# for matplotlib.axes.AxesSubplot object could
# be passed by kwargs later
df.plot(kind='line', colormap='Spectral', title='Line Plot', linewidth='2')
[(ax.set_ylabel("Power in GW"),
 ax.set_xlabel("Date and Time"),
 ax.legend(('Wind', 'PV', 'Biomass', 'RoR', 'Demand'), loc='upper right'))
 for ax in plt.gcf().axes]

df.plot(kind='bar', stacked=True, colormap='Greens', title='Bar Plot')
[ax.legend(loc='upper right') for ax in plt.gcf().axes]

df.plot(kind='barh', stacked=True, colormap='Oranges', title='H-Bar Plot')
[ax.legend(loc='upper right') for ax in plt.gcf().axes]

df.plot(kind='area', stacked=False, alpha=0.5, colormap='Spectral',
        title='Area Plot')
[ax.legend(('Wind', 'PV', 'Biomass', 'RoR', 'Demand'),
           loc='upper right') for ax in plt.gcf().axes]

df.plot(kind='box', colormap='Reds', title='Box Plot')
[ax.legend(loc='upper right') for ax in plt.gcf().axes]

df.loc['2015-12-22 12:00:00':'2015-12-22 18:00:00', 2:3] \
    .plot(kind='hist', stacked=True, bins=20, colormap='ocean',
          title='Histogram of a subset')
[ax.legend(('Col1', 'Col2'), loc='upper right') for ax in plt.gcf().axes]

df.plot(kind='scatter', x=0, y=1,
        title='Scatter Plot (first vs. second column)')
[ax.legend(loc='upper right') for ax in plt.gcf().axes]

It would be quickly implemented on top of Uwes work and should fulfill most needs as I do not want to spend too much time on visualisation tweaking.

@oemof/oemof-main

Whats your opinion on this?

Happy christmas Cord

uvchik commented 8 years ago

@ckaldemeyer wrote:

He started with basic matplotlib which has all configuration options but in my opinion sometimes to many for standard plots as it is overwhelming...

No, so far the devplots module is based on the pandas plotting functions and not on basic matplotlib. One module creates DataFrames with all flows around one bus. Than you can create plots based on this DataFrame. One default plot is implemented but if you have special wishes you can create you own ones.

The idea was to create a plotting library based on the EnergySystem class to make it easy to get default plots, so that people do not have to spend time on programming plots again and again.

@ckaldemeyer It would be helpful if you add you ideas to this library.

ckaldemeyer commented 8 years ago

@uvchik : I'll push my state tomorrow. It took a bit longer dealing with a pandas MultiIndex (http://pandas.pydata.org/pandas-docs/stable/advanced.html) but I think it's worth spending time on it!

simnh commented 8 years ago

Multiindexing looks pretty cool to me, especially as we always have tuples as sets for optimization model. Looking forward to your push ;-)

cswh commented 8 years ago

Have a look at the features/pypower branch. There, the energy system class has a method to plot itself as graph. This could be a blueprint for plotting functions.

ckaldemeyer commented 8 years ago

I can't find it there. But I am almost done and we can still adjust it later

c-moeller commented 8 years ago

I'm not sure if this is the proper thread to address this issue, but within the RLI oemof team we discussed last summer a budget for support in data processing and results analysis (including plotting). Due to organizational problems this has not been realized so far, but popped up again today and seems still relevant. We have now a student who is interested in doing this and I have to clarify the budget once more. This just as an information.. if it's not right here, please move this comment or tell me :-)

ckaldemeyer commented 8 years ago

I have pushed my current state to "features/outputlib-based-on-pandas" and adjusted the storage optimization example to show how it works!

Make sure that your pandas version is >= 0.17.0. Otherwise the multiindex will fail..

There is still some stuff to do (see TODOs) but I think that at least the idea gets clearer. At the moment I am not sure if it is really necessary to write more plotting functions as it is more or less just a "passing through" of parameters and plots are always individual and a matter of taste. Maybe instead of this some "slicing methods" that return pre-formatted dataframes for different purposes would also do the job. But that's more for the discussion..

ckaldemeyer commented 8 years ago

I have tested the code with renpass-gis as well and it seems to work fine. Only the dataframe takes some time being created in the beginning. But it is still quite fast and can be improved..

ckaldemeyer commented 8 years ago

@caro-rli : Does it mean someone at the RLI gets paid to improve our plotting-/result-code? This would be great ;)

uvchik commented 8 years ago

I tested your to_pandas module. I don't understand why you changed the example instead of just using the EnergySystem class. I added this possibility and reverted the storage example to the old version. Now both plots work. Revert the commit if you don't like it.

I will read more about the multiindex and maybe i can adapt my plot to this DataFrame. I still like the combination of bar and line plots to check the results but this is a matter of taste.

If the multiindex DataFrame proves its value in the long run the method to create it could be part of the EnergySystem class (convert_results_to_dataframe).

ckaldemeyer commented 8 years ago

That's fine. Go ahead

ckaldemeyer commented 8 years ago

I will read more about the multiindex and maybe i can adapt my plot to this DataFrame. I still like the combination of bar and line plots to check the results but this is a matter of taste.

Bar and line plots can still be plotted easily. From my opinion some slicer-methods that convert subsets of the multiindex df into preformatted easily plottable dataframes should be enough including one or two common standard plots (e. g. power versus time and annual sums). The preformatted easily plottable dataframes can then be plotted with individual styles as described here (see here http://pandas.pydata.org/pandas-docs/stable/visualization.html) or using matplotlib.

In my opinion, only the slicer-methods and one or two standard plots should be part of the framework. Further plotting could then be done on app-level by extending the class and doesn't blow up the code which also has to be adapted to every change.

If the multiindex DataFrame proves its value in the long run the method to create it could be part of the EnergySystem class (convert_results_to_dataframe).

Either here or in the class as it is now. We should discuss that!

simnh commented 8 years ago

I think the only thing that is missing now is a implementation of a stacked plot with steps. This is something pandas plotting does not easily provide, but I think we need it as we are discrete in terms of our timesteps. (setting kind="area" in the df.plot() method for instance doesn't satisfy me...)

ckaldemeyer commented 8 years ago

No, that's something I see as well. I'll see if I can sort out something on Friday using pandas as well. Otherwise it will be matplotlib but based on a well pre-formatted dataframe. For now I am back in bed... :/

uvchik commented 8 years ago

@simonhilpert For steps with pandas you can use the drawstyle='steps-mid' argument. I use it in the outputlib as you can see in the actual commit of the features/outputlib-based-on-pandas branch. Just execute the storage_invest example. steps

uvchik commented 8 years ago

@ckaldemeyer

I think now I fully understand the idea of the Multiindex DataFrame. Thank you. For me it looks good.

I agree that we should not add too many plots to the library. Maybe we could create a gallery with nice plots based on the EnergySystem class or the Multiindex DataFrame (like matplotlib, just smaller: http://matplotlib.org/gallery.html). But it helps if some plots are ready to use within the outputlib.

Printing the results is missing in the example file, but maybe it should also use the Multiindex DataFrame.

ckaldemeyer commented 8 years ago

I added a stackplot method to your to_pandas class that is based on your plot_bus method. If it is okay like this we can remove the devplot module

To me it looks good. I wouldn't have expected it to be so easy. Thanks!

I cleaned up the example file (pep8, removing unused lines, ...)

Thanks.

We should talk about the name to_pandas is not a talking name for a plotting class.

In my opinion, the name depends on what it is supposed to be. For me, the class provides a structured and easily usable data structure for results with an additional printing option. Thus, for me it's less plotting than structuring results. But it should be discussed.

Any suggestions for a good name?

If we use it like this I have to write the docstring of the stackplot method

What about the others? Do you think this is a good way to go?

ckaldemeyer commented 8 years ago

Just in case we go this way, here are some TODOs from my side:

uvchik commented 8 years ago

Make dataframe creation and plotting configurable with as less code as possible via kwargs Proposal Cord: method plot_bus(bus_uid, bus_type, date_from, date_to, kwargs) where **kwargs holds everything that is plot-related like now in 'df_plot_kwargs', {}

Good idea but at least for the stackplot we have to differ between plot options and options for the stackplot method. So we still need something like df_plot_kwargs.

Uniform code Docstrings: r''' vs """ and completition

We decided to use numpydocs and they use r""" text... """ so I will change that.

kwargs['tick_distance'] vs. kwargs.get('xlabel')

I think we should use the get method if a None is okay. If a None causes errors somewhere in the pandas/matplotlib code than it is better to use kwargs['blablubb'] to get the error directly. If you agree I will check my code and do it this way.

Try to circumvent addditional plotting code when using the class Can we reduce this to "es_df.stackplot(bus_uid="bel"...)" and put the rest into the class?

The idea is that you can easily plot a combined plot:

fig = plt.figure(figsize=(24, 14))

# First part
ax = fig.add_subplot(2, 1, 1)
es_df.stackplot(bus_uid="bel"....)

# Second part
ax = fig.add_subplot(2, 1, 2)
es_df.stackplot(bus_uid="bheat"....)

But I can divide it into two methods and allow both ways. Okay?

ckaldemeyer commented 8 years ago

Sounds good!

ckaldemeyer commented 8 years ago

I have just talked to Günni and there are still two more entries missing in the dataframe.

Update in TODO-list:

ckaldemeyer commented 8 years ago

Here's the updated TODO-list:

Additionally, I have added the possibility to create the dataframe only for specific busses/bus types by passing a list of uids/types.

ckaldemeyer commented 8 years ago

Oh, I have just read my mails and saw your pull request. Anyhow, we are making progress here ;)

uvchik commented 8 years ago

My Todos for today are finished:

:memo: Still missing the full docstrings. I think we should add the optional parameters to the docstring. What do you think?

:speech_balloon: As discussed I will merge the branch, but please test the example and give feedback.

:warning: Be aware that you may have to change your Apps!

uvchik commented 8 years ago

We still have to find a name for the class that builds the dataframe and provides some basic plots. Even though I'm already used to it, I think to_pandas is not very catchy for newcomers.

uvchik commented 8 years ago

@ckaldemeyer : Please update the requirements in the setup.py. Actual it is pandas >= 0.13.

ckaldemeyer commented 8 years ago

Done. But the storage invest example is not working anymore. I am already searching for the error..

ckaldemeyer commented 8 years ago

Still missing the full docstrings. I think we should add the optional parameters to the docstring. What do you think?

I would leave them out. For me it should be lean and provide basic plotting. Everything beyond requires the user to look deeper into pandas/matplotlib anyway.

ckaldemeyer commented 8 years ago

We still have to find a name for the class that builds the dataframe and provides some basic plots. Even though I'm already used to it, I think to_pandas is not very catchy for newcomers.

What about solph_results_to_pandas ? Basically this it what it does ;-)

ckaldemeyer commented 8 years ago

Works now.

The error was

#energysystem.restore()
energysystem.optimize()

Probably from your testing procedures.

uvchik commented 8 years ago

Sorry, I fixed it in the dev-branch after merging and forgot to fix it in the outputlib-based-on-pandas-branch, too.

uvchik commented 8 years ago

What about solph_results_to_pandas?

But it also provides basic plots. What about pandas_plots or pandas_output? In my opinion the DataFrame is just a tool to make plots or other outputs easier. The main goal is to get plots, csv-files, pdf's ... .

uvchik commented 8 years ago

Still missing the full docstrings. I think we should add the optional parameters to the docstring. What do you think?

I would leave them out. For me it should be lean and provide basic plotting. Everything beyond requires the user to look deeper into pandas/matplotlib anyway.

I'm not talking about the pandas arguments but the parameters we set within our method such as kwargs.setdefault('date_format', '%d-%m-%Y').

ckaldemeyer commented 8 years ago

Still missing the full docstrings. I think we should add the optional parameters to the docstring. What do you think?

I would leave them out. For me it should be lean and provide basic plotting. Everything beyond requires the user to look deeper into pandas/matplotlib anyway.

I'm not talking about the pandas arguments but the parameters we set within our method such as kwargs.setdefault('date_format', '%d-%m-%Y').

For me it is not necessary. But go ahead if you want.

ckaldemeyer commented 8 years ago

What about solph_results_to_pandas?

But it also provides basic plots. What about pandas_plots or pandas_output? In my opinion the DataFrame is just a tool to make plots or other outputs easier. The main goal is to get plots, csv-files, pdf's ... .

Then I would prefer pandas_output. It depends on what we want the module to be. For me it's more a data-extractor with the additional ability to create plots.

ckaldemeyer commented 8 years ago

Btw: Should we delete the features/outputlib-with-pandas branch and create a new one for further developments?

uvchik commented 8 years ago

Still missing the full docstrings. I think we should add the optional parameters to the docstring. What do you think?

I would leave them out. For me it should be lean and provide basic plotting. Everything beyond requires the user to look deeper into pandas/matplotlib anyway.

I'm not talking about the pandas arguments but the parameters we set within our method such as kwargs.setdefault('date_format', '%d-%m-%Y').

For me it is not necessary. But go ahead if you want.

That is really funny. Of course it is not necessary for you. You wrote it.

uvchik commented 8 years ago

Agree with closing the branch. For me we can leave the name as it is and let somebody else find a new name. I think we did a lot service for users/developers who want to plot. Thank you for your work.

I would also close this issue and start a new one with the remaining ToDos if there are any.

ckaldemeyer commented 8 years ago

I have just fixed an issue that occured with renpassgis on the dev branch. I'll close this issue and remove the branch, too.

Thank you and Simon for your contributions!