Open remicousin opened 1 year ago
The overall approach of label the data, group by labels, aggregate over groups feels good to me.
I'm not so confident about dayofyear366, because of the confusion around leap days. I've thought about that problem enough to know that thinking about it more probably isn't going to help, one just has to pick one of the several imperfect solutions available. I'm curious why you felt you needed to implement it now. Is isn't relevant to the onset/cessation work as far as I can see.
I did it then mostly because I had an epiphany and wanted to... battre le fer tant qu'il est chaud. I do want to make a daily climatology for the water balance work so there is a little bit of motivation from there. If we validate that route, it will impact also the onset/cessation work to compute them for multiple years. This is now done by our seasonal_ functions that rely on dailytobegroupedby (that you didn't like) that this work is heavily inspired from. It really is the same thing, only written (API?) differently.
@aaron-kaplan I am going to continue our conversation here. First, I can split the seasonal work from the climato/ano work since they don't share code and since the seasonal thing is more pressing now.
I realize that there is an additional difficulty with the water balance application of seasonal grouping. Which is that water balance accepts multiple inputs and return multiple outputs. It turns out that there is a Dataset.groupby but so that means I also need to make water balance functions accept its inputs as one Dataset, and returns another Dataset. So that I can apply the labelling-dropping function to it, then the grouping function (groupby), then the operator on groups...
Adding Xandre more systematically now on this "core" work... even if only for FYI. This talks about the seasonal case only now. I moved the clim/ano part into another PR.
Revisited this based on Tuesday conversation. Need to work on doc, adding in particular examples of usage. Tests are now real tests. Want to add test for sparse data
@aaron-kaplan can we move this on?
Namely to do "seasonal" analysis, ie to apply same analysis to a slice of daily data within each year and to do "daily climatology" and back from clim to real time to, for instance, compute anomalies.
There are 2 functions and 2 tests. Tests aren't really tests but rather how I envision the usage (or what would be inside functions easier for the user to call). seasonal_groups creates the groups for seasonal analysis while dayofyear366 creates the groups for climatologies.
The first test function shows one can apply one after the other different functions to seasons only (of course no more seasonal analysis can be done after a reducing function is applied -- and also, one doesn't have to apply a reducing function, the end result is then a normal daily variable, only with data points only at days of seasons). Here seasonal_groups ensures that we keep only complete seasons, but I think it should keep also truncated seasons at extremities and let user slice accordingly if they don't want incomplete seasons. I labeled the groups with the first day of the season, but we should also keep the last day. Or keep the "groups" attribute some how that is a dictionary of lists showing all the points associated with each label.
The climato/ano case... I haven't much to comment. Should we be given a 365-day climato, it would be easy to project it to a 366-day one such as the one I created and fill Feb 29 with something and have the user have a few options about what to fill with.
Also with this set up, we need not create functions for each seasonal (e.g. seasonal_sum, seasonal_onset) or climato/ano analysis (e.g. clim: mean, stddev...; ano: subtract, divide...), but can describe the process in documentation, or make functions calling functions (this time allowing users to see them): seasonal(daily_data, start_day, start_month, end_day, end_month, [func1, func2...]) climato(daily_data, [func1, func2...]) ano(daily_data, [func1, func2...])
(also, this is not necessarily meant to merge: this is playground.... I thought there was a label or something for that but now I can't see it)