radiantearth / stac-spec

SpatioTemporal Asset Catalog specification - making geospatial assets openly searchable and crawlable
https://stacspec.org
Apache License 2.0
757 stars 180 forks source link

Is Item start_datetime to end_datetime an inclusive or exclusive range? #1255

Closed lossyrob closed 2 months ago

lossyrob commented 9 months ago

A start_datetime and end_datetime can be added to Item properties as per the Common Metadata spec.

The end_datetime is defined as "The last or end date and time for the Item, in UTC.".

From this description, it is not clear whether the start_datetime -> end_datetime is an inclusive or exclusive range.

For instance, if there's an annual dataset where the Item's date range is 2022-01-01T00:00:00 - 2023-01-01T00:00:00, does this represent only the year of 2022, or all of 2022 and also the very first second of 2023?

Based on feedback I've heard, I would suggest that a start time inclusive, end time exclusive range would make the most sense in practical terms.

bmcandr commented 9 months ago

Hi @lossyrob,

I'm with @impactobservatory and Dan asked me to share my thoughts with you on this topic. This has been a source of internal debate for us so we'd appreciate clarity/guidance. We have generally operated under the assumption that end_datetime is exclusive as you described and therefore set the start/end_datetime fields on our annual LULC map Items to 2022-01-01T00:00:00 and 2023-01-01T00:00:00 respectively. This results in somewhat unexpected results when performing searches using pystac_client, for example. I might naively expect that performing a search with the argument datetime="2022" would return the Items representing our 2022 map, but pystac_client's date -> datetime expansion results in ~0 Items returned in this case.~ Items from both 2021 and 2022.

I look forward to hearing what the community thinks!

gadomski commented 9 months ago

Can you provide more information about your query and your system? E.g.

That way we can dig in a bit more. Thanks!

bmcandr commented 9 months ago

(FYI, I made a correction to the final sentence my earlier comment.)

We're running a fork of stac-fastapi w/ pgstac backend (we haven't yet upgraded to stac-fastapi-pgstac yet). Our internal STAC server is private, but the behavior I described is reproducible with our io-lulc-9-class STAC Collection on PCH and pystac_client:

from pystac_client import Client
client = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search = client.search(collections=["io-lulc-9-class"], datetime="2022")
print(len(search.item_collection()))
# 1482

# check start dates
print({item.properties["start_datetime"] for item in search.item_collection()})
# {'2021-01-01T00:00:00Z', '2022-01-01T00:00:00Z'}
# 2021 Items are included because their end date is 2022-01-01T00:...

# using a query we can get exactly what we want
query = {"start_datetime": {"eq": "2022-01-01T00:00:00Z"}}
search = client.search(collections=["io-lulc-9-class"], query=query)
print(len(search.item_collection()))
# 756

# should only have 2022 Items
print({item.properties["start_datetime"] for item in search.item_collection()})
# {'2022-01-01T00:00:00Z'}
aliasmrchips commented 9 months ago

The suggestion is that it is just maybe worth being clear in the spec. Since it it is not specified, it is up to those implementing the spec to decide how it should behave, which could lead to confusion. I would second @lossyrob 's suggestion that date ranges be interpreted as[start_datetime, end_datetime).

gadomski commented 9 months ago

(FYI, I made a correction to the final sentence my earlier comment.)

Ah ok thanks, that makes more sense -- the 0 returns was surprising to me.

The suggestion is that it is just maybe worth being clear in the spec.

Agreed.

From the implementation side of things, most code I've seems assumes inclusive -- e.g. pystac-client makes an "inclusive" range, and pgstac uses inclusive search: https://github.com/stac-utils/pgstac/blob/e3ae32d5e4c4b29731026ed9133add0d2a04eb73/src/pgstac/sql/004_search.sql#L158. That's not to say that's correct, or how it will be specified in the spec, that's just to explain behavior.

matthewhanson commented 8 months ago

I hadn't really thought about this before @lossyrob brought it up, and just always thought that inclusive makes the most sense, as I (and probably most people) think in terms of date only e.g., 2022-01-01/2022-12-31, rather than time.

The behavior of pystac-client when you specify dates and not time is to fill the first date with the earliest time, and the second date with the latest time, e.g., 2022-01-01T00:00:00Z/2022-12-31T23:59:59.9Z

I think for a human this makes the most intuitive sense, and although the spec may not be clear I think that was the intention.

However, what gives me pause now is theoretically the "latest" time is never going to be the latest time, no matter how many 9's we include, so I'm inclined to move toward an exclusive end since it's the most correct.

From a practical standpoint I'm not sure it matters one way or the other, as long as users know what the behavior is.

We've got two options:

m-mohr commented 5 months ago

As we are describing data here, it's not directly related to search. Search is a different story and defined in another spec.

Let's say I have a capture that takes two seconds: 2022-01-01T00:00:00Z - 2022-01-01T00:00:02Z (that's what I get from the source metdata). How am I supposed to make this exclusive? It's the same issue that Matt describes:

However, what gives me pause now is theoretically the "latest" time is never going to be the latest time, no matter how many 9's we include, so I'm inclined to move toward an exclusive end since it's the most correct.

This also happens here, but the other way around. I'd need to append an infinite number of 0's and a 1 at the end.

Also, datetime is pretty much our equivalent for start_datetime = end_datetime. If we make end_datetime exclusive, this is not true any longer.

From work in openEO (same discussion), I know that ISO is also not 100% certain about it and changed their definition over time. We ended up making it inclusive, which I think is also the latest status in ISO8601.

I think I slightly tend towards an inclusive end_datetime, but there will be always pros and cons for both sides.

LiamBindle commented 5 months ago

Hi all, thanks for all your work. Just wanted to voice my support for the end date being exclusive. I find it easier to work with intervals that have an exclusive end date because it means a collection of items can have complete temporal coverage without needing to tweak the end date by an undefined amount of time (e.g., a second or a microsecond).

m-mohr commented 5 months ago

@LiamBindle Isn't that an argument for having the end date inclusive? If it's exclusive and you create data (i.e. this is not the search use case), then you need to tweak the end date by an undefined amount of time.

LiamBindle commented 2 months ago

EDIT: Reopened in https://github.com/radiantearth/stac-spec/issues/1283

@m-mohr Sorry I missed your response and question. My bad, and I see this has already gone ahead. I'm going to reopen this because I think an inclusive end date introduces a logical flaw, so I'd like to advocate for making end_datetime exclusive one more time. Feel free to reclose if you don't think this needs any more discussion--I'm not trying to make a mountain out of a mole hill.

Isn't that an argument for having the end date inclusive?

I don't think so. Say you have an item that represents an average for the year 2018. When start is inclusive and end is exclusive you have start_datetime="2018-01-01T00:00:00.000000000Z" and end_datetime="2019-01-01T00:00:00.000000000Z". If the end date is inclusive then you need to subtract an undefined amount of time (1ns?) from the ending date. I.e., should it be end_datetime="2018-12-31T23:59:59.999999999Z"?

More importantly, to respond to your Q higher in this thread:

Let's say I have a capture that takes two seconds: 2022-01-01T00:00:00Z - 2022-01-01T00:00:02Z (that's what I get from the source metdata). How am I supposed to make this exclusive? It's the same issue that Matt describes:

In this case, the provided end date in the sources metadata is already exclusive isn't it? The period [2022-01-01T00:00:00Z,2022-01-01T00:00:02Z), has a duration of exactly 2 seconds with an exclusive end date.

But what happens if there is another capture for the next 2 seconds? If the end date is inclusive then two items claim to cover the "2022-01-01T00:00:02Z" instant in time whereas an exclusive end date handles it cleanly. If the end date is inclusive then there is no way to represent a time series where every instant in time is covered by exactly one item.