vergauwenthomas / MetObs_toolkit

A toolkit for using non-traditional meteorological observations
https://vergauwenthomas.github.io/MetObs_toolkit/
MIT License
12 stars 4 forks source link

leading and trailing def #417

Closed vergauwenthomas closed 6 days ago

vergauwenthomas commented 7 months ago

Proposal by @amberJ99

Problem: The selection of leading and trailing periods is based on a number of assumed observations (debias_pref_sample_size_leading) rather than the use of time deltas. This can give bad results if there is a large (seasonal) gap located in the leading/trailing periods.

Proposal: Use the leading/trailing min max time deltas as timedeltas (so max 30 days ahead) and use the minimum criterium as a minimum in several observations.

 # Select all leading and all trailing obs
    leading_period = obs[obs["datetime"] < gap.startgap]
    trailing_period = obs[obs["datetime"] > gap.endgap]
    logger.debug(f'   {leading_period.shape[0]} leading records, {trailing_period.shape[0]} trailing records.')

    # some derived integers
    poss_shrinkage_leading = leading_period.shape[0] - debias_min_sample_size_leading
    poss_shrinkage_trailing = trailing_period.shape[0] - debias_min_sample_size_trailing
    poss_extention_leading = leading_period.shape[0] - debias_pref_sample_size_leading
    poss_extention_trailing = (
        trailing_period.shape[0] - debias_pref_sample_size_trailing
    )

    # check if desired sample sizes for leading and trailing are possible
    if (leading_period.shape[0] >= debias_pref_sample_size_leading) & (
        trailing_period.shape[0] >= debias_pref_sample_size_trailing
    ):
        logger.debug("leading and trailing periods are both available for debiassing.")
        # both periods are oke
        leading_df = leading_period[-debias_pref_sample_size_leading:]
        trailing_df = trailing_period[:debias_pref_sample_size_trailing]
vergauwenthomas commented 7 months ago