opengridcc / opengrid-dev

Open source building monitoring, analysis and control
Apache License 2.0
26 stars 21 forks source link

Issue108 daily functions #109

Closed JrtPec closed 8 years ago

JrtPec commented 8 years ago

108

This is basically a backward-compatible implementation. But I’d really like to get rid of that split_by_day function. We don’t need it anymore for these functions and I think we can use Pandas more efficiently in other circumstances.

I’d also like to make these two functions more generic, because now the only difference is min and max. But again this is for backwards compatibility.

saroele commented 8 years ago

Good work, I think these are correct simplifications. I didn't yet take time to integrate the min and max functions, so you took a step back and that's good!

These min and max are atm only used in the caching mechanisms, so we should also adapt these and then the backwards compatibility is solved.

We have a unittest for split_by_day. We could get rid of the split_by_day function and adapt the test to become a test of the daily_min and daily_max functions. OTOH, we should dissect our required functionality in such way that things are reusable. That was the original reason for separating the split_by_day from the analysis functions. If we keep on repeating the same line in our analysis methods to slice dataframes in a specific way, a sub-function is useful.

When you are finished, I'll look at the pull request as a whole and adapt the caching on it. Let me know when you stop working on this so we avoid merge conflicts.

JrtPec commented 8 years ago

At first I thought to just remove the split_by_day function, because I believe there will always be a way to do an analysis without having to resort to splitting a dataframe up in subframes and working with lists of dataframes. Pandas is super powerful in this regard and we should exploit that.

That being said, split_by_day is still a useful function, so I have rewritten it to use some built-in filtering and grouping, which should be faster and is more readable.

I think I'm finished with this issue, so please review it. I have tested it in the notebooks, works fine for me. Caching should be faster too I recon ;-)

saroele commented 8 years ago

ok, I'l lhave a look now. The main caching cpu time comes from fetching the data from tmpo. One day we'll have to attack that issue too :-( But not a priority now.

On Thu, Feb 25, 2016 at 3:59 PM, Jan Pecinovsky notifications@github.com wrote:

At first I thought to just remove the split_by_day function, because I believe there will always be a way to do an analysis without having to resort to splitting a dataframe up in subframes and working with lists of dataframes. Pandas is super powerful in this regard and we should exploit that.

That being said, split_by_day is still a useful function, so I have rewritten it to use some built-in filtering and grouping, which should be faster and is more readable.

I think I'm finished with this issue, so please review it. I have tested it in the notebooks, works fine for me. Caching should be faster too I recon ;-)

— Reply to this email directly or view it on GitHub https://github.com/opengridcc/opengrid/pull/109#issuecomment-188822131.

saroele commented 8 years ago

Jan, if we solve the backward compatibility, do you agree to remove the daily_min and daily_max functions entirely? Your daily does the job now. We can maybe rename daily to daily_aggregation ?

JrtPec commented 8 years ago

You might want to check out #110 first then. It is an implementation of that Analysis class we talked about in #94.