Add VDI profiles based on lpagg

uvchik commented 2 years ago

The basic code is taken from https://github.com/jnettels/lpagg but most of the code has been revised for modular use and to speed up the code. The lpagg package contains various functionality but only the VDI-profiles and reading the TRY data from DWD is used.

To test the new profiles use the vdi_profile_example in the examples section. The parameter of the houses are added within a loop. I used this to multiply the number of houses to check how long it would take to model e.g. 1000 houses. 1000 houses on an hourly base will take less than a minute, which shows that the draft is at least fast enough :racing_car:

Related to #36

Install this branch to test the code:

pip install https://github.com/oemof/demandlib/archive/features/add-vdi-from-lpagg

It is the first draft, I am open for concrete ideas.

[ ] add tests
[ ] add documentation
[ ] clean code

jnettels commented 2 years ago

@uvchik thanks very much, especially the speed looks promising! Not iterating through rows does indeed help a lot :-) I was able to run your example. I have reduced it to a calculation for a single house, and I am still getting differences between my implementation and yours, even if they are very small. It comes down to me identifying e.g. 21. October 2017 in TRY04 as WWB, while your code yields WWH. The difference is the daily mean of the cloud cover (< or >5). This only happens for 4 days, I think.

I am not 100% sure yet, but I think it comes down to the following: At 15min frequency, Pandas df.resample("D").mean() will use the values between (including) 2017-10-21 00:00 and 2017-10-21 23:15 to give the mean of 2017-10-21. However, in the weather data, 2017-10-21 00:00 describes the weather conditions between 2017-10-20 23:00 and 2017-10-21 00:00. The value in 2017-10-21 00:00 therefore belongs to 2017-10-20. To put it more simply: By DWD definition, the weather data does not start at 2017-1-1 00:00, but 2017-1-1 01:00. You seem to ignore that and simply load the weather data, starting at 2017-1-1 00:00, which may actually solve those problems. It may very well be that your solution is correct, and my many workarounds lead to incorrect results. Whenever I think about this stuff, I have to double and tripple check.

Some more stuff that might need consideration:

[ ] You seem to allow leap years, but that would lead to missing rows in the weather data. Setting vdi.Region(year=2020) currently throws ValueError: Length mismatch: Expected 8760 rows, received array of length 8784 and a more helpful message might be appropriate
[ ] The defining temperature for summer should be exposed to the user, with default 15°C (Tamb_heat_limit in my code)
[x] If no weather data is given, the default weather file for the given TRY region should be loaded
[ ] The user should be able to override the used weather timeseries. (These could then be used for leap years, if desired.)
[x] While using vdi.Region(holidays=holidays.country_holidays('DE', subdiv='NI')) from https://pypi.org/project/holidays/ already works (yay!), it could be actively encouraged in vdi_profile_example. I think that project is awesome.

There might be more stuff, but these are all minor issues I think. I need some time for more investigation, though, to see if the API fits all the needs that I can come up with.

uvchik commented 2 years ago

[x] The weather data seems to be fine to me now. The DWD data starts with 01-01-01 (Month-Day-Hour), while weather data is typically right stamped. So in my left stamped index the first value is 2017-01-01 00:00. For me a right stamped index seems to be counter intuitive for a daily index
[ ] My idea is to duplicate the nearest day with the same weekday to create the typical day for the 29th of February. If it is a Sunday I would use the last Sunday, if it is a weekday I would use the last weekday which would be the 28th or the 27th. I think this is good enough regarding all the other assumptions. (still a ToDo)
[ ] I am not sure whether we should offer this or not. Some of the factors are connected to the TRY region. I seems to be strange to use an arbitrary data set with this procedure. In that case one has to pass the weather data and the TRY region.
[x] I use workalendar in my example, but feel free to use your favourite holiday package.

pep8speaks commented 2 years ago

Hello @uvchik! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2022-09-06 18:25:57 UTC

jnettels commented 2 years ago

Here are the notes about what we discussed:

[x] Make summer_temperature_limit (currently set in the ini file) an attribute of each house. As a consequence, different series of typical days have to be created for each unique temperature. These typical days then have to be correctly used for the correct houses
[x] Allow manipulation of the weather data. Using the default weather data should be the default behaviour, but using your own weather data should also be possible. For example, vdi.Region could have a function get_weather() which would yield a DataFrame which can be manipulated and a function set_weather() which overwrites that default weather. This can be used before calling get_load_curve_houses().

Some more notes that came up while replacing my own implementation in lpagg with yours:

[x] I think it could be more convenient for further processing if the DataFrame lc returned by vdi.region.get_load_curve_houses() had named column levels. Simplest would be lc.columns.set_names(['name', 'house_type', 'energy'], inplace=True) before returning it, but it might me more appropriate to apply the names at different concat() stages, like the following. However, this probably should be in line with the results returned from the existing BDEW implementation.
```
        load_curve_house.columns = pd.MultiIndex.from_product(
            [
                [house["house_type"]],
                load_curve_house.columns,
            ],
            names=['house_type', 'energy']
        )
        house_profiles[house["name"]] = load_curve_house
    return pd.concat(
        house_profiles.values(), keys=house_profiles.keys(), axis=1,
        names=['name']
    )
```
[ ] Another very general point: Currently the new module is simply named vdi, which is extremely general. Would vdi4655 not be more appropriate, in case other vdi norms are implemented in the future?
[x] If I am not mistaken, TRY (as an attribute of each house) is unused and should be removed in the example script (since it is defined by the region instead)
[ ] When I implemented it, I made the house data a nested dictionary. You changed it to a list of dictionaries instead. In retrospect, I think both are not ideal. I later added an option to lpagg to read the house data from a table file. (See this example xlsx: https://github.com/jnettels/lpagg/blob/master/lpagg/examples/VDI_4655_houses_example.xlsx). When you start to define large lists of houses, a table is in my opinion the most intuitive data type. Therefore I would like to consider changing the argument houses of vdi.Region() to a DataFrame. Of course, you could internally convert it to a list of dictionaries immediately and not change your code. I am only talking about the interface.
[ ] The point above would also provide an opportunity to implement an empty houses DataFrame that contains all the required rows. If creating an object of vdi.Region() without the houses argument, you could afterwards ask for the empty DataFrame and fill it with your data? This way the required attributes of each house are actually documented. ...or is that too convoluted?
[ ] If I see correctly, the house attributes N_Pers and N_WE are currently unused. In VDI4655, they serve two purposes: -- Check validity: VDI only claims support for N_Pers <= 12 (for EFH) and N_WE <= 40 (for MFH) -- Provide default values for Q_TWW_a and W_a (annual energy for domestic hot water and electricity) If we are not using them here, they should drop from the house attributes. But I think we should use them. I have a separate function lpagg.VDI4655.get_annual_energy_demand() which takes the houses data, checks those requirements and fills in the default values if they were not provided by the user. After some clean-up, this could become an optional preprocessing step before putting houses into vdi.Region(). Or vdi.Region() does it internally, but that would be less transparent to the user.

(Please tell me if I should do some of those implementations myself, or if you prefer to do it yourself.)

Further discussion:

In your example script, the houses get the attributes copies and sigma. If I see correctly, these are not currently used and could therefore be deleted for now. But they are related to the concept of "generating simultaneity effects through time shifts based on random draws from a normal distribution". This could become a seperate issue and pull request, but I would like to discuss if this could have a place in demandlib as well

jnettels commented 2 years ago

I took the liberty to make two changes based on my previous suggestions: Named column levels and the ability to set my own weather file path. These changes help me to use demandlib within lpagg.

I think now the most important remaining issue is the ability to define the summer_temperature_limit for each house. My other suggestions are just for convenience.

jnettels commented 2 years ago

I have not tested it yet, but thanks for adding the temperature limits. Though, since you removed my solution to define a custom weather file, I need to ask: Was it rude (or just inconvenient) of me to push to this branch, or did you not agree with my approach?

p-snft commented 2 years ago

As this is a draft, I marked it as such.

Side note: I really like the feature. It's very handy to have generation of time series for SH, DHW, and electricity consumption at hand.

uvchik commented 2 years ago

For me it is a bit rude to write your code into my approach without any comment. I already wrote improvements and got dozen of merge conflicts. If I know somebody else is working on the code I would push more often to avoid such problems.

I do have a different approach for the weather file so I think the best way would be to finish this PR with cleaning the code and adding some tests and then start a new PR for the weather data. That makes it easier to track changes and to focus the discussion. One reason for that is that the original approach is connect to the TRY data. I am fine with adding your own weather data but it must be clear that you really should know what you are doing.

jnettels commented 2 years ago

Please accept my apologies for pushing into this branch, then. While I did leave a small comment about it here, I see how it creates unnecessary merge conflicts. Afterwards I learned how to use the "Suggested Change" feature and will stick to that in the future. Doing a new PR for the weather data is fine with me, too.

jnettels commented 1 year ago

Hi there,

it has been quite a while, but I am still depending on this implementation with its awesome speed. Is there any chance to get it over the finish line and merge it? Is there something I can do to help?

The changes I require to make it work for me personally are added in my fork and affect the ability to load other weather data. Just leaving the link here for reference: https://github.com/jnettels/demandlib/tree/features/add-vdi-from-lpagg

p-snft commented 1 year ago

@jnettels: Honestly, I'm not really deep into demandlib. I'm willing to review and give feedback. If you have an improved version, you can file a PR against this branch.

jnettels commented 1 year ago

Thanks for your answer. My "improvements" concerning the weather data were already suggested, once with a rude (still sorry about that!) push to this PR, once with the "suggestion" feature. @uvchik already mentioned that he had a different approach for the weather data, so I do not think creating a new PR for that is helpful now.

oemof / demandlib

Add VDI profiles based on lpagg #50

Comment last updated at 2022-09-06 18:25:57 UTC