Open arfriedman opened 1 year ago
:+1: to this request.
I have an analysis right now where I'm interested in a (southern hemisphere) growing season NDJFM.
Ideally I'd like to run the following:
ds.temporal.group_average(
'pr',
freq='season',
season_config={
'dec_mode': 'DJF',
'drop_incomplete_djf': True,
'custom_seasons': ['Nov', 'Dec', 'Jan', 'Feb', 'Mar']
}
)
Assuming my input dataset ds
starts in January 1985, I'd like the first year not to be included because it would be incomplete with only Jan '85, Feb '85, Mar '85 available but not the required Nov '84 and Dec '84 to make a complete NDJFM season. I also have no interest in the rest of the year so custom_seasons
would not include all 12 months.
Thank you for opening this issue @arfriedman! Also thank you @DamienIrving for your input! This feature enhancement definitely sounds useful.
Here are the set of improvements based on the feedback:
custom_season = ["Nov", "Dec", "Jan", "Feb", "Mar"]
:
["Nov", "Dec"]
are from the previous year since they are listed before "Jan"
["Jan", "Feb", "Mar"]
are from the current year_shift_decembers()
to shift other months too. This method shifts the previous year December to the current year so xarray can properly group "DJF" seasons spanning calendar years._drop_incomplete_djf()
drop_incomplete_djf
with drop_incomplete_season
@tomvothecoder as discussed in the meeting, something like below converter could help make the custom season function to be easier to use.
def generate_calendar_months(custom_season, output_type: str = "month_abbreviations"):
"""
Generates a list of calendar months corresponding to the given custom season.
Args:
custom_season (str): A string representing a custom season (e.g., "MJJ").
output_type (str, optional): default is "month_abbreviations" which returns month abbreviations. If set to "month_numbers", it will return months in numbers.
Returns:
list: A list of strings of calendar months corresponding to the given custom season, or a list of numbers
Raises:
ValueError: If the length of the custom season is longer than 12 or if the custom season is not found in the months.
ValueError: If `output_type` is not one of "month_abbreviations" or "month_numbers"
Example:
>>> generate_calendar_months("MJJ")
['May', 'Jun', 'Jul']
"""
# Define the mapping of month abbreviations to full month names
months_mapping = [
("J", "Jan", 1), ("F", "Feb", 2), ("M", "Mar", 3), ("A", "Apr", 4),
("M", "May", 5), ("J", "Jun", 6), ("J", "Jul", 7), ("A", "Aug", 8),
("S", "Sep", 9), ("O", "Oct", 10), ("N", "Nov", 11), ("D", "Dec", 12)
] * 2 # Repeat the mapping to cover cases where the custom season wraps around to the beginning of the year
# Generate a string representation of all months by concatenating their abbreviations
months = ''.join([m[0] for m in months_mapping])
# Check if the length of the custom season exceeds 12
if len(custom_season) > 12:
raise ValueError("Custom season length cannot be longer than 12")
if output_type == "month_abbreviations":
k = 1
elif output_type == "month_numbers":
k = 2
else:
raise ValueError(f"{output_type} should be either of 'month_abbreviations' or 'numbers'")
# Iterate through the months to find the starting index of the custom season
for i in range(len(months) - len(custom_season) + 1):
if months[i:i+len(custom_season)] == custom_season:
# Once the custom season is found, return the corresponding list of months
return [months_mapping[(i + j) % 12][k] for j in range(len(custom_season))]
# If the custom season is not found, raise a ValueError
raise ValueError("Custom season '{}' not found in months '{}'".format(custom_season, months))
Test, return as month abbreviations:
custom_season = "MJJAS"
result = generate_calendar_months(custom_season)
print(result)
custom_season = "NDJFM"
result = generate_calendar_months(custom_season)
print(result)
custom_season = "JJASONDJFMAM"
result = generate_calendar_months(custom_season)
print(result)
['May', 'Jun', 'Jul', 'Aug', 'Sep']
['Nov', 'Dec', 'Jan', 'Feb', 'Mar']
['Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May']
Test 2, return as month numbers:
custom_season = "MJJAS"
result = generate_calendar_months(custom_season, output_type="month_numbers")
print(result)
custom_season = "NDJFM"
result = generate_calendar_months(custom_season, output_type="month_numbers")
print(result)
custom_season = "JJASONDJFMAM"
result = generate_calendar_months(custom_season, output_type="month_numbers")
print(result)
[5, 6, 7, 8, 9]
[11, 12, 1, 2, 3]
[6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5]
Test 3, error cases:
custom_season = "JAM"
result = generate_calendar_months(custom_season)
ValueError: Custom season 'JAM' not found in months 'JFMAMJJASONDJFMAMJJASOND'
custom_season = "JFMAMJJASONDJ"
result = generate_calendar_months(custom_season)
ValueError: Custom season length cannot be longer than 12
@tomvothecoder please feel free to incorporate the above code if you find it is useful. No worries otherwise.
@tomvothecoder please feel free to incorporate the above code if you find it is useful. No worries otherwise.
@lee1043 Thanks Jiwoo! I'll consider your function for improving the custom_seasons
arg.
Here's an upstream version: https://github.com/pydata/xarray/pull/9524 . I could use some help testing it out.
Is your feature request related to a problem?
Following #393, it seems that it would be useful to expand custom seasons functionality across calendar years. Examples include: taking the water year from October to September, or taking a boreal winter average from December to March.
Describe the solution you'd like
I envision the main change would be so that the order of months listed in custom seasons does matter, and could span across calendar years.
For example, this configuation would create Apr-Nov and Dec-March averages, the latter which extends into the following year:
Associated with this change I think it would also make sense to generalize the season_config parameters beyond the existing options for DJF, e.g. the flag
“drop_incomplete_djf”
could become something like"drop_incomplete_season."
In addition to seasons that cross the calendar year, this could apply to datasets that end in the middle of a season (for example, the last Apr-June season for a dataset that ends in May).Describe alternatives you've considered
No response
Additional context
As a possible aside, I was also wondering if it is necessary to keep the requirement to include all 12 months in the
custom_seasons
list. I imagine that often users are only interested in one season.