Open isichei opened 2 years ago
Blocked by #51
I don't have access to the xhibit_v1
database at the moment but this fails on the following query.
releases = wr.athena.read_sql_query(
"""
SELECT DISTINCT C.prison, C.offender_id, S.date_of_release
FROM nomis_ao.core AS C
LEFT JOIN nomis_ao.sentences AS S
ON S.extract_datetime = C.extract_datetime AND S.record_number = C.record_number
WHERE S.date_of_release IS NOT NULL
""",
database = "nomis_ao",
ctas_approach = False,
pyarrow_additional_kwargs={
"coerce_int96_timestamp_unit": "ms",
"timestamp_as_object": True
}
)
gives the AttributeError: Can only use .dt accessor with datetimelike values
error.
For reference filtering the year does work
releases = wr.athena.read_sql_query(
"""
SELECT DISTINCT C.prison, C.offender_id, S.date_of_release
FROM nomis_ao.core AS C
LEFT JOIN nomis_ao.sentences AS S
ON S.extract_datetime = C.extract_datetime AND S.record_number = C.record_number
WHERE S.date_of_release IS NOT NULL AND YEAR(S.date_of_release) < 2100
""",
database = "nomis_ao",
ctas_approach = False,
pyarrow_additional_kwargs={
"coerce_int96_timestamp_unit": "ms",
"timestamp_as_object": True
}
)
It seems not to like years after 2500 - this data looks like typos from prison staff.
There was an error where aws wrangler would not convert parquet dates correctly. This has been fixed in AWS Data Wrangler 2.12.0 (PR that fixed it for context) so we need to point pydbtools to at least this release.
Check that it functions correctly run the workaround that I posted in the PR:
You can use any SQL query that has an
mojap_end_datetime
as the default value is2999-01-01 00:00:00
for latest records. This query should give back correct timestamps (2999-01-01 00:00:00
).If it works you will need to update the
pyproject.toml
to the relevant awrangler dependency.