Open uvchik opened 2 years ago
@uvchik thanks very much, especially the speed looks promising! Not iterating through rows does indeed help a lot :-) I was able to run your example. I have reduced it to a calculation for a single house, and I am still getting differences between my implementation and yours, even if they are very small. It comes down to me identifying e.g. 21. October 2017 in TRY04 as WWB, while your code yields WWH. The difference is the daily mean of the cloud cover (< or >5). This only happens for 4 days, I think.
I am not 100% sure yet, but I think it comes down to the following: At 15min frequency, Pandas df.resample("D").mean() will use the values between (including) 2017-10-21 00:00
and 2017-10-21 23:15
to give the mean of 2017-10-21
. However, in the weather data, 2017-10-21 00:00
describes the weather conditions between 2017-10-20 23:00
and 2017-10-21 00:00
. The value in 2017-10-21 00:00
therefore belongs to 2017-10-20
. To put it more simply: By DWD definition, the weather data does not start at 2017-1-1 00:00
, but 2017-1-1 01:00
.
You seem to ignore that and simply load the weather data, starting at 2017-1-1 00:00
, which may actually solve those problems. It may very well be that your solution is correct, and my many workarounds lead to incorrect results. Whenever I think about this stuff, I have to double and tripple check.
Some more stuff that might need consideration:
vdi.Region(year=2020)
currently throws ValueError: Length mismatch: Expected 8760 rows, received array of length 8784
and a more helpful message might be appropriateTamb_heat_limit
in my code)vdi.Region(holidays=holidays.country_holidays('DE', subdiv='NI'))
from https://pypi.org/project/holidays/ already works (yay!), it could be actively encouraged in vdi_profile_example
. I think that project is awesome.There might be more stuff, but these are all minor issues I think. I need some time for more investigation, though, to see if the API fits all the needs that I can come up with.
Hello @uvchik! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Here are the notes about what we discussed:
summer_temperature_limit
(currently set in the ini file) an attribute of each house. As a consequence, different series of typical days have to be created for each unique temperature. These typical days then have to be correctly used for the correct housesvdi.Region
could have a function get_weather()
which would yield a DataFrame which can be manipulated and a function set_weather()
which overwrites that default weather. This can be used before calling get_load_curve_houses()
.Some more notes that came up while replacing my own implementation in lpagg with yours:
[x] I think it could be more convenient for further processing if the DataFrame lc
returned by vdi.region.get_load_curve_houses()
had named column levels. Simplest would be lc.columns.set_names(['name', 'house_type', 'energy'], inplace=True)
before returning it, but it might me more appropriate to apply the names at different concat() stages, like the following. However, this probably should be in line with the results returned from the existing BDEW implementation.
load_curve_house.columns = pd.MultiIndex.from_product(
[
[house["house_type"]],
load_curve_house.columns,
],
names=['house_type', 'energy']
)
house_profiles[house["name"]] = load_curve_house
return pd.concat(
house_profiles.values(), keys=house_profiles.keys(), axis=1,
names=['name']
)
[ ] Another very general point: Currently the new module is simply named vdi
, which is extremely general. Would vdi4655
not be more appropriate, in case other vdi norms are implemented in the future?
[x] If I am not mistaken, TRY
(as an attribute of each house) is unused and should be removed in the example script (since it is defined by the region instead)
[ ] When I implemented it, I made the house data a nested dictionary. You changed it to a list of dictionaries instead. In retrospect, I think both are not ideal. I later added an option to lpagg to read the house data from a table file. (See this example xlsx: https://github.com/jnettels/lpagg/blob/master/lpagg/examples/VDI_4655_houses_example.xlsx). When you start to define large lists of houses, a table is in my opinion the most intuitive data type. Therefore I would like to consider changing the argument houses
of vdi.Region()
to a DataFrame. Of course, you could internally convert it to a list of dictionaries immediately and not change your code. I am only talking about the interface.
[ ] The point above would also provide an opportunity to implement an empty houses DataFrame that contains all the required rows. If creating an object of vdi.Region() without the houses
argument, you could afterwards ask for the empty DataFrame and fill it with your data? This way the required attributes of each house are actually documented. ...or is that too convoluted?
[ ] If I see correctly, the house attributes N_Pers
and N_WE
are currently unused. In VDI4655, they serve two purposes:
-- Check validity: VDI only claims support for N_Pers <= 12 (for EFH) and N_WE <= 40 (for MFH)
-- Provide default values for Q_TWW_a
and W_a
(annual energy for domestic hot water and electricity)
If we are not using them here, they should drop from the house attributes. But I think we should use them. I have a separate function lpagg.VDI4655.get_annual_energy_demand()
which takes the houses data, checks those requirements and fills in the default values if they were not provided by the user. After some clean-up, this could become an optional preprocessing step before putting houses
into vdi.Region()
. Or vdi.Region()
does it internally, but that would be less transparent to the user.
(Please tell me if I should do some of those implementations myself, or if you prefer to do it yourself.)
Further discussion:
copies
and sigma
. If I see correctly, these are not currently used and could therefore be deleted for now. But they are related to the concept of "generating simultaneity effects through time shifts based on random draws from a normal distribution". This could become a seperate issue and pull request, but I would like to discuss if this could have a place in demandlib as wellI took the liberty to make two changes based on my previous suggestions: Named column levels and the ability to set my own weather file path. These changes help me to use demandlib within lpagg.
I think now the most important remaining issue is the ability to define the summer_temperature_limit
for each house. My other suggestions are just for convenience.
I have not tested it yet, but thanks for adding the temperature limits. Though, since you removed my solution to define a custom weather file, I need to ask: Was it rude (or just inconvenient) of me to push to this branch, or did you not agree with my approach?
As this is a draft, I marked it as such.
Side note: I really like the feature. It's very handy to have generation of time series for SH, DHW, and electricity consumption at hand.
For me it is a bit rude to write your code into my approach without any comment. I already wrote improvements and got dozen of merge conflicts. If I know somebody else is working on the code I would push more often to avoid such problems.
I do have a different approach for the weather file so I think the best way would be to finish this PR with cleaning the code and adding some tests and then start a new PR for the weather data. That makes it easier to track changes and to focus the discussion. One reason for that is that the original approach is connect to the TRY data. I am fine with adding your own weather data but it must be clear that you really should know what you are doing.
Please accept my apologies for pushing into this branch, then. While I did leave a small comment about it here, I see how it creates unnecessary merge conflicts. Afterwards I learned how to use the "Suggested Change" feature and will stick to that in the future. Doing a new PR for the weather data is fine with me, too.
Hi there,
it has been quite a while, but I am still depending on this implementation with its awesome speed. Is there any chance to get it over the finish line and merge it? Is there something I can do to help?
The changes I require to make it work for me personally are added in my fork and affect the ability to load other weather data. Just leaving the link here for reference: https://github.com/jnettels/demandlib/tree/features/add-vdi-from-lpagg
@jnettels: Honestly, I'm not really deep into demandlib. I'm willing to review and give feedback. If you have an improved version, you can file a PR against this branch.
Thanks for your answer. My "improvements" concerning the weather data were already suggested, once with a rude (still sorry about that!) push to this PR, once with the "suggestion" feature. @uvchik already mentioned that he had a different approach for the weather data, so I do not think creating a new PR for that is helpful now.
The basic code is taken from https://github.com/jnettels/lpagg but most of the code has been revised for modular use and to speed up the code. The lpagg package contains various functionality but only the VDI-profiles and reading the TRY data from DWD is used.
To test the new profiles use the
vdi_profile_example
in the examples section. The parameter of the houses are added within a loop. I used this to multiply the number of houses to check how long it would take to model e.g. 1000 houses. 1000 houses on an hourly base will take less than a minute, which shows that the draft is at least fast enough :racing_car:Related to #36
Install this branch to test the code:
It is the first draft, I am open for concrete ideas.