openearth / aeolis-python

A process-based model for simulating supply-limited aeolian sediment transport
http://aeolis.readthedocs.io/
GNU General Public License v3.0
33 stars 25 forks source link

Add ISO 8601 formatted timestamps to NetCDF output for improved interoperability #185

Open frederikvand opened 2 months ago

frederikvand commented 2 months ago

Description

The NetCDF output generated by Aeolis currently uses numeric time values representing seconds since a reference date. While this is a valid representation, it can be enhanced by including ISO 8601 formatted timestamps as an additional attribute. ISO 8601 is an international standard for representing dates and times, offering several benefits when it comes to coupling models and exchanging data between different systems.

The current Julian decoding of the time in the Aeolis code uses the refdate to generate the time_bounds, occasionally causing a problem when reading the data with xarray, resulting in an OutOfBoundsDatetime error: "Cannot decode times from a non-standard calendar, 'julian', using pandas. Additionally, the Julian calendar has known inaccuracies compared to the actual solar year, causing a drift of about one day every 128 years . This can lead to issues when interpreting dates and times over long simulation periods.

To ensure that the simulation results can be correctly interpreted by anyone who receives the NetCDF file, it would be beneficial to additionally include the time zone associated with the reference date.

Proposed Solution

  1. Add a 'timezone' parameter to the configuration file to specify the time zone for the reference date:
'refdate'                       : '2020-01-01 00:00', # [-] Reference datetime in netCDF output
'timezone'                      : 'UTC',              # [-] Time zone for the reference date
  1. Modify the 'initialize' function to include an 'iso8601' attribute for the 'time' variable, incorporating the time zone information:
def initialize(outputfile, outputvars, s, p, dimensions):
    with netCDF4.Dataset(outputfile, 'w') as nc:
        # ...
        nc.createVariable('time', 'float64', (u'time',))
        nc.variables['time'].long_name = 'time'
        nc.variables['time'].standard_name = 'time'
        nc.variables['time'].units = 'seconds since %s' % p['refdate']
        nc.variables['time'].calendar = 'proleptic_gregorian'  # Use Gregorian calendar instead of Julian
        nc.variables['time'].axis = 'T'
        nc.variables['time'].bounds = 'time_bounds'
        nc.variables['time'].iso8601 = lambda d: (datetime.datetime.strptime(p['refdate'], '%Y-%m-%d %H:%M').replace(tzinfo=pytz.timezone(p['timezone'])) + datetime.timedelta(seconds=d)).isoformat()
        # ...
  1. Introduce a calculate_iso_dates function to compute ISO 8601 timestamps for 'output_times' and 'dzb_interval', using the specific time zone:
def calculate_iso_dates(refdate, tstart, tstop, interval):
    ref_datetime = datetime.datetime.strptime(refdate, '%Y-%m-%d %H:%M').replace(tzinfo=pytz.timezone(p['timezone']))
    start_datetime = ref_datetime + datetime.timedelta(seconds=tstart)
    end_datetime = ref_datetime + datetime.timedelta(seconds=tstop)

    current_datetime = start_datetime
    iso_dates = []

    while current_datetime <= end_datetime:
        iso_dates.append(current_datetime.isoformat())
        current_datetime += datetime.timedelta(seconds=interval)

    return iso_dates

output_times_iso = calculate_iso_dates(p['refdate'], p['tstart'], p['tstop'], p['output_times'])
dzb_interval_iso = calculate_iso_dates(p['refdate'], p['tstart'], p['tstop'], p['dzb_interval'])

Benefits of Including Time Zone Information and Using Gregorian Calendar

Unambiguous Interpretation: By including the time zone and using the more accurate Gregorian calendar, anyone who receives the NetCDF file can correctly interpret the timestamps, eliminating potential confusion caused by the Julian calendar's inaccuracies. Interoperability: Including time zone information makes the NetCDF output more compatible with tools and systems that expect timestamps to be accompanied by time zone data. This facilitates seamless data exchange and analysis across different platforms. Reproducibility: Explicitly specifying the time zone helps in reproducing the simulation results, as it provides a clear reference point for the timestamps. This is particularly important when sharing data or publishing results.