Closed JrtPec closed 8 years ago
Good work, I think these are correct simplifications. I didn't yet take time to integrate the min and max functions, so you took a step back and that's good!
These min and max are atm only used in the caching mechanisms, so we should also adapt these and then the backwards compatibility is solved.
We have a unittest for split_by_day
. We could get rid of the split_by_day
function and adapt the test to become a test of the daily_min and daily_max functions. OTOH, we should dissect our required functionality in such way that things are reusable. That was the original reason for separating the split_by_day
from the analysis functions. If we keep on repeating the same line in our analysis methods to slice dataframes in a specific way, a sub-function is useful.
When you are finished, I'll look at the pull request as a whole and adapt the caching on it. Let me know when you stop working on this so we avoid merge conflicts.
At first I thought to just remove the split_by_day
function, because I believe there will always be a way to do an analysis without having to resort to splitting a dataframe up in subframes and working with lists of dataframes. Pandas is super powerful in this regard and we should exploit that.
That being said, split_by_day
is still a useful function, so I have rewritten it to use some built-in filtering and grouping, which should be faster and is more readable.
I think I'm finished with this issue, so please review it. I have tested it in the notebooks, works fine for me. Caching should be faster too I recon ;-)
ok, I'l lhave a look now. The main caching cpu time comes from fetching the data from tmpo. One day we'll have to attack that issue too :-( But not a priority now.
On Thu, Feb 25, 2016 at 3:59 PM, Jan Pecinovsky notifications@github.com wrote:
At first I thought to just remove the split_by_day function, because I believe there will always be a way to do an analysis without having to resort to splitting a dataframe up in subframes and working with lists of dataframes. Pandas is super powerful in this regard and we should exploit that.
That being said, split_by_day is still a useful function, so I have rewritten it to use some built-in filtering and grouping, which should be faster and is more readable.
I think I'm finished with this issue, so please review it. I have tested it in the notebooks, works fine for me. Caching should be faster too I recon ;-)
— Reply to this email directly or view it on GitHub https://github.com/opengridcc/opengrid/pull/109#issuecomment-188822131.
Jan, if we solve the backward compatibility, do you agree to remove the daily_min and daily_max functions entirely? Your daily does the job now. We can maybe rename daily to daily_aggregation
?
You might want to check out #110 first then. It is an implementation of that Analysis class we talked about in #94.
108
This is basically a backward-compatible implementation. But I’d really like to get rid of that
split_by_day
function. We don’t need it anymore for these functions and I think we can use Pandas more efficiently in other circumstances.I’d also like to make these two functions more generic, because now the only difference is
min
andmax
. But again this is for backwards compatibility.