Open scottyhq opened 3 hours ago
I think you can fix it either way, such as by avoiding casting None
to string here. But I also didn't know that pyarrow was able to cast the strings to dates, and so that's more appealing to me.
We shouldn't use pandas because this arrow
module is intended to not have a dependency on pandas.
But I also didn't know that pyarrow was able to cast the strings to dates
I'm new to arrow, so I definitely fumbled around a bit!
I thought this would work: pa.scalar('2024-08-24T17:52:27.135933+00:00', type=pa.timestamp('us', tz='UTC'))
but raises ArrowTypeError: object of type <class 'str'> cannot be converted to int
But it works if you first go to a pyarrow string and then cast: pa.scalar(timestamp_str, type=pa.string()).cast(pa.timestamp('us', tz='UTC'))
For a heterogenous collection of STAC Items with some containing a timestamp property like
updated
and others that do not, coercing to timestamps fails because the code seems to be trying to convert a pyarrow'None'
string to timestamp: https://github.com/stac-utils/stac-geoparquet/blob/4b00f5be649609a896242f391d6e9c56377c7f25/stac_geoparquet/arrow/_to_arrow.py#L82I think this scenario might be common for APIs that are returning metadata that changes over time. I came across this using this public endpoint https://docs.canopy.umbra.space/docs/archive-catalog-searching-via-stac-api
I tried a quick fix which seems to work, but not sure it's the best approach... I just removed
ciso8601
and let Arrow handle the casting 😅.Alternatively, using pandas to coerce timestamps is also mentioned here https://github.com/stac-utils/stac-geoparquet/pull/31#discussion_r1544730642