xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
117 stars 12 forks source link

[Feature]: custom seasons that span calendar years #416

Open arfriedman opened 1 year ago

arfriedman commented 1 year ago

Is your feature request related to a problem?

Following #393, it seems that it would be useful to expand custom seasons functionality across calendar years. Examples include: taking the water year from October to September, or taking a boreal winter average from December to March.

Describe the solution you'd like

I envision the main change would be so that the order of months listed in custom seasons does matter, and could span across calendar years.

For example, this configuation would create Apr-Nov and Dec-March averages, the latter which extends into the following year:

custom_seasons = [
    ["Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov"],
    ["Dec", "Jan", "Feb", "Mar"],
]

Associated with this change I think it would also make sense to generalize the season_config parameters beyond the existing options for DJF, e.g. the flag “drop_incomplete_djf” could become something like "drop_incomplete_season." In addition to seasons that cross the calendar year, this could apply to datasets that end in the middle of a season (for example, the last Apr-June season for a dataset that ends in May).

Describe alternatives you've considered

No response

Additional context

As a possible aside, I was also wondering if it is necessary to keep the requirement to include all 12 months in the custom_seasons list. I imagine that often users are only interested in one season.

DamienIrving commented 1 year ago

:+1: to this request.

I have an analysis right now where I'm interested in a (southern hemisphere) growing season NDJFM.

Ideally I'd like to run the following:

ds.temporal.group_average(
    'pr',
    freq='season',
    season_config={
        'dec_mode': 'DJF',
        'drop_incomplete_djf': True,
        'custom_seasons': ['Nov', 'Dec', 'Jan', 'Feb', 'Mar']
    }    
)

Assuming my input dataset ds starts in January 1985, I'd like the first year not to be included because it would be incomplete with only Jan '85, Feb '85, Mar '85 available but not the required Nov '84 and Dec '84 to make a complete NDJFM season. I also have no interest in the rest of the year so custom_seasons would not include all 12 months.

tomvothecoder commented 1 year ago

Thank you for opening this issue @arfriedman! Also thank you @DamienIrving for your input! This feature enhancement definitely sounds useful.

Here are the set of improvements based on the feedback:

  1. Support custom seasons that span calendar years
    • Requires detecting order of the months in a season. Currently, order does not matter.
    • For example, for custom_season = ["Nov", "Dec", "Jan", "Feb", "Mar"]:
      • ["Nov", "Dec"] are from the previous year since they are listed before "Jan"
      • ["Jan", "Feb", "Mar"] are from the current year
    • We can potentially extend _shift_decembers() to shift other months too. This method shifts the previous year December to the current year so xarray can properly group "DJF" seasons spanning calendar years.
  2. Detect and drop incomplete seasons
    • Right now xCDAT only detects incomplete "DJF" seasons with _drop_incomplete_djf()
    • Replace boolean config drop_incomplete_djf with drop_incomplete_season
    • A possible solution for detecting incomplete seasons is to check if a season has all of the required months. If the count of months for that season does not match the expected count, then drop that season.
  3. Remove requirement for all 12 months to be included in a custom season
lee1043 commented 6 months ago

@tomvothecoder as discussed in the meeting, something like below converter could help make the custom season function to be easier to use.

def generate_calendar_months(custom_season, output_type: str = "month_abbreviations"):
    """
    Generates a list of calendar months corresponding to the given custom season.

    Args:
        custom_season (str): A string representing a custom season (e.g., "MJJ").
        output_type (str, optional): default is "month_abbreviations" which returns month abbreviations. If set to "month_numbers", it will return months in numbers.

    Returns:
        list: A list of strings of calendar months corresponding to the given custom season, or a list of numbers

    Raises:
        ValueError: If the length of the custom season is longer than 12 or if the custom season is not found in the months.
        ValueError: If  `output_type` is not one of "month_abbreviations" or "month_numbers"

    Example:
        >>> generate_calendar_months("MJJ")
        ['May', 'Jun', 'Jul']
    """
    # Define the mapping of month abbreviations to full month names
    months_mapping = [
        ("J", "Jan", 1), ("F", "Feb", 2), ("M", "Mar", 3), ("A", "Apr", 4),
        ("M", "May", 5), ("J", "Jun", 6), ("J", "Jul", 7), ("A", "Aug", 8),
        ("S", "Sep", 9), ("O", "Oct", 10), ("N", "Nov", 11), ("D", "Dec", 12)
    ] * 2  # Repeat the mapping to cover cases where the custom season wraps around to the beginning of the year

    # Generate a string representation of all months by concatenating their abbreviations
    months = ''.join([m[0] for m in months_mapping])

    # Check if the length of the custom season exceeds 12
    if len(custom_season) > 12:
        raise ValueError("Custom season length cannot be longer than 12")

    if output_type == "month_abbreviations":
        k = 1
    elif output_type == "month_numbers":
        k = 2
    else:
        raise ValueError(f"{output_type} should be either of 'month_abbreviations' or 'numbers'")

    # Iterate through the months to find the starting index of the custom season
    for i in range(len(months) - len(custom_season) + 1):
        if months[i:i+len(custom_season)] == custom_season:
            # Once the custom season is found, return the corresponding list of months
            return [months_mapping[(i + j) % 12][k] for j in range(len(custom_season))]

    # If the custom season is not found, raise a ValueError
    raise ValueError("Custom season '{}' not found in months '{}'".format(custom_season, months))

Test, return as month abbreviations:

custom_season = "MJJAS"
result = generate_calendar_months(custom_season)
print(result)

custom_season = "NDJFM"
result = generate_calendar_months(custom_season)
print(result)

custom_season = "JJASONDJFMAM"
result = generate_calendar_months(custom_season)
print(result)
['May', 'Jun', 'Jul', 'Aug', 'Sep']
['Nov', 'Dec', 'Jan', 'Feb', 'Mar']
['Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar', 'Apr', 'May']

Test 2, return as month numbers:

custom_season = "MJJAS"
result = generate_calendar_months(custom_season, output_type="month_numbers")
print(result)

custom_season = "NDJFM"
result = generate_calendar_months(custom_season, output_type="month_numbers")
print(result)

custom_season = "JJASONDJFMAM"
result = generate_calendar_months(custom_season, output_type="month_numbers")
print(result)
[5, 6, 7, 8, 9]
[11, 12, 1, 2, 3]
[6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5]

Test 3, error cases:

custom_season = "JAM"
result = generate_calendar_months(custom_season)
ValueError: Custom season 'JAM' not found in months 'JFMAMJJASONDJFMAMJJASOND'
custom_season = "JFMAMJJASONDJ"
result = generate_calendar_months(custom_season)
ValueError: Custom season length cannot be longer than 12
lee1043 commented 6 months ago

@tomvothecoder please feel free to incorporate the above code if you find it is useful. No worries otherwise.

tomvothecoder commented 6 months ago

@tomvothecoder please feel free to incorporate the above code if you find it is useful. No worries otherwise.

@lee1043 Thanks Jiwoo! I'll consider your function for improving the custom_seasons arg.

dcherian commented 1 month ago

Here's an upstream version: https://github.com/pydata/xarray/pull/9524 . I could use some help testing it out.