opendatacube / datacube-stats

Data Cube Temporal Statistic Tools
http://www.ga.gov.au/about/projects/geographic/digital-earth-australia
Other
22 stars 6 forks source link

Off by one day temporal binning error #193

Open Kirill888 opened 4 years ago

Kirill888 commented 4 years ago

Datasets captured over east coast of Australia on 1-Jan-2017 get included into annual statistics for the year 2016.

Example dataset: https://explorer.dea.ga.gov.au/dataset/6b0472ff-6e1f-434f-8313-37a2ee183924

Effect of UTC dates on Statistical Product Time Boundaries

What this means for annual statistical products computed over Australia: some datasets that were captured on 1-Jan-Year (local time) will be counted towards Year-1.

Example python to run in sandbox:

from datacube import Datacube
from datetime import timedelta

dc = Datacube()
dss = dc.find_datasets(product='ga_ls8c_ard_3', time='2016-12-31')
dates_utc = set(ds.center_time.date() for ds in dss)
dates_actual = set((ds.center_time + timedelta(hours=10)).date() for ds in dss)
print(f"UTC: {dates_utc}")
print(f"Local: {dates_actual}")

Output:

UTC: {datetime.date(2016, 12, 31)}
Local: {datetime.date(2016, 12, 31), datetime.date(2017, 1, 1)}

Meaning that some datasets that were captured on 1-Jan-2017 are seen by datacube as captured on 31-Dec-2016 instead.

I feel that for stats products, when a user defines time period as 2016, what is meant is "local time 2016" and not "UTC 2016". For every output tile we should compute timezone offset and query database accordingly. To the best of my knowledge current stats code does not perform such correction for the time component of the query.

RichardScottOZ commented 3 years ago

I would agree, having not read this. In an Australian context would expect Australian time.