Closed ckaldemeyer closed 8 years ago
@uvchik @simonhilpert : Add your ideas if you want
I have added some notes..
@uvchik: Just as an information. I have just used the class to analze results for renpassgis. For one year, the dataframe holds more than 11 million entries and it takes 20-30 minutes to generate it. But this is for all busses.
But I'll have a look if we can tweak something..
@ckaldemeyer wrote:
1) It should be possible to pass a result-object directly. For this, we need a way to deal with the datetime-index which normally comes from the energy system
The more options we have the more cases we have to check every time we change anything. You have to pass both informations anyway, so what is the problem to create a simple EnergySystem just as a container.
my_es = EnergySystem(results=my_result_dict, time_idx=my_time_index)
my_df = EnergySystemDataFrame(energy_system=my_es)
2) Add a flag like "return_df". If set to True, the dataframe-object will be returned directly
This works already for me. The init-method cannot return anything.
my_df = tpd.EnergySystemDataFrame(energy_system=my_es)
print(es_df.data_frame)
3) Add possibility to exclude components in _plotbus() method
+1
4) Add a plot method that gives a good overview about the bus balance. My idea would be a stacked area or a normal stacked stackplot. All inputs are positive and all outputs negative. Additionally, the residual load will be plotted as a line and should be zero.
Or a table that shows the balance of the year and a function that returns the dates and time where the balance is not valid.
5) Add methods to store/restore the dataframe using pickle or something equivalent
+1 (e.g. hdf5, csv, postgres...)
6) Realize dataframe creation and plotting in different classes (separate data and visualisation)
I really like this idea. This is all the more true, as we will have more kinds of outputs like key values, summary tables, reports... +1
So far I do not have any wishes but thank you for your incitements.
Thanks for your feedback! I'll start on the branch "features/outputlib-improvements".
My newest idea was to design the class to_pandas (better name would be nice) as a subclass of the pandas dataframe class. This way, all methods are operating on the object "self" and we have the advantage that all conventional pandas methods are also available to manipulate the object from outside.
By doing so, we could access the object like this:
my_df = tpd.EnergySystemDataFrame(energy_system=my_es)
print(my_df)
my_df.plot_bus(uid="bla")
I'll see how far I get in the next days and post my progress.
Ideas:
There's something I stumIed upon: I don't understand why the bus variables are saved the way they are at the moment in the result object.
optimization_model.py:
if hasattr(self, "dual"):
for bus in getattr(self, str(Bus)).objs:
if bus.balanced:
result[bus] = result.get(bus, {})
result[bus][bus] = [
self.dual[getattr(self, str(Bus)).balance[(bus.uid, t)]]
for t in self.timesteps]
for bus in getattr(self, str(Bus)).objs:
if bus.excess:
result[bus] = result.get(bus, {})
result[bus]['excess'] = [
getattr(self, str(Bus)).excess_slack[(bus.uid, t)].value
for t in self.timesteps]
if bus.shortage:
result[bus] = result.get(bus, {})
result[bus]['shortage'] = [
getattr(self, str(Bus)).shortage_slack[(bus.uid, t)].value
for t in self.timesteps]
I don't see a difference between the types of the variables since they either belong to one object (bus) or are a result of multiple objects (component interaction). So they should be treated in the same way in my opinion.
@gnn @simonhilpert
Why don't we just add an entry result[bus]['duals']
instead of result[bus][bus] for the dual variables?
Testing renpass-gis, I have just noticed that there are multiple entries for duals:
val
bus_uid bus_type type obj_uid datetime
el: DEdr13 el other dual 2015-01-01 00:00:00 0
2015-01-01 00:00:00 0
2015-01-01 00:00:00 0
2015-01-01 00:00:00 0
2015-01-01 00:00:00 0
2015-01-01 01:00:00 0
2015-01-01 01:00:00 0
2015-01-01 01:00:00 0
2015-01-01 01:00:00 0
2015-01-01 01:00:00 0
excess 2015-01-01 00:00:00 0
2015-01-01 01:00:00 0
shortage 2015-01-01 00:00:00 0
2015-01-01 01:00:00 0
storage_140081388946880 2015-01-01 00:00:00 0
2015-01-01 01:00:00 0
storage_140081388978864 2015-01-01 00:00:00 0
2015-01-01 01:00:00 0
storage_140081388982112 2015-01-01 00:00:00 0
2015-01-01 01:00:00 0
storage_140081389014096 2015-01-01 00:00:00 0
2015-01-01 01:00:00 0
storage_140081389041984 2015-01-01 00:00:00 0
2015-01-01 01:00:00 0
These all seem to be saved under the same key if I get it right:
if hasattr(self, "dual"):
for bus in getattr(self, str(Bus)).objs:
if bus.balanced:
result[bus] = result.get(bus, {})
result[bus][bus] = [
self.dual[getattr(self, str(Bus)).balance[(bus.uid, t)]]
for t in self.timesteps]
@simonhilpert : Is it somehow possible to add unique identifiers here?
you are right. Dual variables exist for every constraint inside the (linear) optimization problem. We can extract all of them but the loop only gets all duals for balanced bus constraint. The unique identifier could then be the bus-uid.
For renpass-gis I think the major interest lies on the duals from balanced-electricity busses which can be interpreted as the marginal electricity price (shadowprice) for the bus.
In the future I would actually extract all duals (if wanted) because in terms of economics as all of them can be interpreted in a certain way. But we can start with duals for balanced busses...
@ckaldemeyer wrote:
My newest idea was to design the class to_pandas (better name would be nice) as a subclass of the pandas dataframe class. This way, all methods are operating on the object "self" and we have the advantage that all conventional pandas methods are also available to manipulate the object from outside.
So i will wait with new changes until you finished the restructure process, to avoid difficult merge conflicts.
I have now fixed the dataframe creation and pushed my results. Somehow I still don't get the duals but everything else should work and is now a bit clearer in the creation. The result-object is such a brainf*\ ...
@simonhilpert : Could you maybe have a look at the duals-problem? We could also talk about that on Wednesday.. @uvchik : Could you have a look if everything works as expected? I have already made some tests but you never know ;)
Concerning the inheritance from pandas classes: Obviously, it is not recommended to inherit from pandas.
See:
but we could use a composition instead:
Concerning the split into two classes: What about these class names?
What do you think?
I'll go to bed now and will be back on Wednesday ;-)
My example seems to work, thanks :smile:
Maybe ResultsDataFrame
instead of EnergySystemDataFrame
but this is not a hard objection. I totally agree with splitting the class.
For me it is not important to inherit or use a composition if there are no big advantages but go ahead if you think it's better (either way).
Maybe
ResultsDataFrame
instead ofEnergySystemDataFrame
but this is not a hard objection. I totally agree with splitting the class.
Then, I agree on ResultsDataFrame
and DataFramePlot
! It's a bit shorter. At the moment I am quite busy with a paper. But I'll spend my spare time to implement the changes!
TODO:
@ckaldemeyer: I edited the create func to improve its performance in features/outputlib-performance. Let me know what you think : )
Is this going to be finished and merged until the end of week or should we set the milestone to february?
No, it`s likely not finished by the end of the week, we should set the milestone to february.
@s3pp : As I already said. Nice improvement!
I think we should at least have the improved dataframe creation in the next release. It's only an "internal" change and the plotting, etc. is already on the dev-Branch anyway.
The version with two classes can then become part of the next release or we already push things until Friday ;)
So we could merge from "outputlib-performance..." into "outputlib-improvements" and then back into "dev".
I need to do some changes for my plots but I think it make no sense to do anything until you split the classes. Do you have a plan when you are going to do it?
At the moment I got stuck with my paper and Martin is busy with renpass-gis. I can tell you more on Monday!
I recently divided the code into two classes: A ResultsDataFrame and a DataFramePlot. Its far from finished unfortunately, because I have some trouble with renpass-gis and not enough time, but one can see the concept behind. The code is on features/fix-outpultlib-dev. I
m still working on it, because the stackplot does not work properly atm.
For me all plots work. Thank you. I activated the plots in the storage_invest example and everything works fine. I pushed it. Revert it if it doesn't work for you.
If you are done, I will revise the plot classes according to the protocol of the web conference.
Huch, thats nice :smile:. You`re absolutely right. Its working as expected now. If you want to, you can already start to revise the plot methods. I was thinking to maybe implement other methods in ResultsDataFrame, but not in DataFramePlot.
I think we can do that in the same branch.
Go ahead ;-)
The first proposal is finished :smile: I left the old stuff to compare the changes but everything below line 291 will be removed. I adapted the storage_invest example to show how the new plotting class works compared to the old one. On the first view there is more code to write but it is very easy code and it makes it much more flexible without handling hundreds of parameters.
The heart of the class is the slice_unstacked
method which creates a ready to plot unstacked subset. With this subset you can just use the pandas plot method.
myplot = to_pandas.DataFramePlot(energy_system=energysystem)
subset = myplot.slice_unstacked(bus_uid="bel", type="input").subset
subset.plot(**pandas_plotting_arguments)
But the class offers you some helper methods to make plotting easier. Some things really work good and it took me and others some time to find the solution. So I wanted to keep them. So I created a plot method which only passes all arguments to pandas. So the following code does exactly the same than the code above.
myplot = to_pandas.DataFramePlot(energy_system=energysystem)
myplot.slice_unstacked(bus_uid="bel", type="input")
myplot.plot(**pandas_plotting_arguments)
If you use the second code block you have the opportunity to use the helper methods. But you still will be able to do all that matplotlib stuff.
myplot = to_pandas.DataFramePlot(energy_system=energysystem)
myplot.slice_unstacked(bus_uid="bel", type="input")
myplot.plot(**pandas_plotting_arguments)
myplot.ax.legend(loc='upper right')
myplot.ax.set_ylabel('Power in MW')
myplot.ax.set_xlabel('Date')
myplot.set_datetime_ticks() # our helper method to make the ticks nicer
With another helper method the legend can be placed outside the plot.
Now it works more like a framework because the helper methods can be used for every plot and we do not have to create overcrowded plotting methods with hundreds of parameters.
Anyway I created an io_plot that replaces the old stackplot method. I did it because the pandas plot has a problem when plotting a stacked line plot, so I had to correct it and this is half of the code Otherwise I would have used the plot method to do it instead of creating a new method.
@ckaldemeyer , @s3pp : What do you think?
I just discussed this topic with @ckaldemeyer . We both think it's very well done and overall a strong improvement. The readability of the storage invest plot creation is also very nice imo. One point could be questionable though, whether the slice_unstacked method would better fit in the ResultsDataFrame class, because it`s single purpose is data handling in the end. Or are there any disadvantages by doing so?
One point could be questionable though, whether the slice_unstacked method would better fit in the ResultsDataFrame class, because it`s single purpose is data handling in the end.
I totally agree. I had the same idea but did not want to write code in the EnergySystemDataFrame
class because I did not know how you still want to restructure this class. So feel free to move it into this class.
Ok. so I moved the method to the ResutlsDataFrame class. Imo we could start thinking about merging it back, don't we, @ckaldemeyer, @uvchik ? I thought that it might be beneficial to include methods for aggregating data on specific levels, but that's not so much restructuring than extending and eventually it isn't even necessary, because of pandas' easy-to-use functions. I will find out while working with the results of renpassg!s.
Ok. so I moved the method to the ResutlsDataFrame class. Imo we could start thinking about merging it back, don't we, @ckaldemeyer, @uvchik ?
:+1: for me. This would also make #80 obsolete since the fix is already contained.
I do not know how it was intended but the ResultsDataFrame
class does not have a subset attribute so far. After @s3pp changes the slice_unstacked
method created one. But it is never used. So I removed it and returned the subset (return subset
) instead of setting the internal attribute.
As I need the internal attribute in the plotting class I inherit the method and extended it by setting the internal attribute. If you decide to have a subset attribute in the ResultsDataFrame
class we can remove the method in the plotting class.
I fixed pep8, added docstrings and removed the old stuff. For me we a ready to merge.
@ckaldemeyer : You are assigned. May you do it?
I see, I see. Actually that was something I wasn't quite sure about.
We do not have to decide that now. For now everything works fine and that is worth a merge :smile:
:+1:
@uvchik: Concerning the merge would you mind if one of you merges? I am really busy atm and do not have the time to fix errors which eventually might occur (even if I think there won't be any)..
No problem, but not today. I will do it tomorrow afternoon.
Thanks!!
Some changes still have to be done. I am using this issue to collect and discuss ideas.
Ideas for changes: