Open MialLewis opened 1 month ago
@zhaner08 / @pettyjamesm can you please share the "secret sauce" in this regard on how AWS Athena is doing the increments on months?
I took a look at the code, and some notable differences that might impact matching the behavior of our implementation are:
LocalDateTime
to store the current value instead of doing zone specific conversions Instant
java.time.format.DateTimeFormatter
instead of java.text.DateFormat
Otherwise, the logic for computing the values is basically the same, with no special handling for ChronoUnit.MONTHS
. Values are generated with a loop like:
// relevant fields
private final LocalDateTime minDateTime;
private final LocalDateTime maxDateTime;
private final ChronoUnit intervalUnit;
private final long intervalSize;
private final DateTimeFormatter dateFormatter;
// relevant logic for iterating values
public Iterator<String> getProjectedValues()
{
return new Iterator<String>()
{
private LocalDateTime next = minDateTime;
@Override
public boolean hasNext()
{
return !next.isAfter(maxDateTime);
}
@Override
public String next()
{
String value = next.format(dateFormatter);
next = next.plus(intervalSize, intervalUnit);
return value;
}
};
}
would love to see a resolution to this. its especially confusing when creating a table using partitioning because you get conflicting messages from the coordinator regarding syntax:
Query 20240826_193927_00017_p3s5b failed: line 13:154: Unable to set catalog 'hive' column property 'partition_projection_interval_unit' to ['YEAR']: Invalid value [YEAR]. Valid values: [NANOS, MICROS, MILLIS, SECONDS, MINUTES, HOURS, HALF_DAYS, DAYS, WEEKS, MONTHS, YEARS, DECADES, CENTURIES, MILLENNIA, ERAS, FOREVER]
create table hive.table.name (
index varchar,
name varchar,
vendor varchar,
site varchar,
start_datetime varchar,
end_datetime varchar,
duration varchar,
duration_hr varchar,
week varchar,
month varchar,
region varchar,
pyear date WITH (partition_projection_type = 'DATE', partition_projection_format = 'yyyy', partition_projection_range = ARRAY['2024','NOW'], partition_projection_interval_unit = 'YEAR', partition_projection_interval = 1),
pmonth date WITH (partition_projection_type = 'DATE', partition_projection_format = 'MM', partition_projection_range = ARRAY['01','12'], partition_projection_interval_unit = 'MONTH', partition_projection_interval = 1),
pday date WITH (partition_projection_type = 'DATE', partition_projection_format = 'dd', partition_projection_range = ARRAY['01','31'], partition_projection_interval_unit = 'DAY', partition_projection_interval = 1),
phour date WITH (partition_projection_type = 'DATE', partition_projection_format = 'HH', partition_projection_range = ARRAY['00','23'], partition_projection_interval_unit = 'HOUR', partition_projection_interval = 1),
pminute date WITH (partition_projection_type = 'DATE', partition_projection_format = 'mm', partition_projection_range = ARRAY['00','59'], partition_projection_interval_unit = 'MINUTE', partition_projection_interval = 1),
psecond date WITH (partition_projection_type = 'DATE', partition_projection_format = 'ss', partition_projection_range = ARRAY['00','59'], partition_projection_interval_unit = 'SECOND', partition_projection_interval = 1)
)
WITH (
external_location = 's3a://bucket-name/some/path/',
format = 'CSV',
skip_header_line_count = 1,
partitioned_by = ARRAY['pyear', 'pmonth', 'pday', 'phour', 'pminute', 'psecond'],
partition_projection_enabled = true,
partition_projection_location_template = 's3a://bucket-name/some/path/${pyear}/${pmonth}/${pday}/${phour}/${pminute}/${psecond}/'
)
but on correcting those partition_projection_interval_unit
to be plural, as required by the error msg above, the coordinator then complains about the restricted available units:
Query 20240826_201021_00054_p3s5b failed: Column projection for column 'pyear' failed. Property: 'partition_projection_interval_unit' value 'Years' is invalid. Available options: [Days, Hours, Minutes, Seconds]
this leaves me with INJECTED
types for year and month as my only options for those partitions,
pyear varchar WITH (partition_projection_type = 'INJECTED'),
pmonth varchar WITH (partition_projection_type = 'INJECTED'),
subscribing for updates
Date partition projection is limited to a
date_projection_interval_unit
of no greater thanDAYS
in thetrino-hive
plugin.I have a scenario where a external data is partitioned by a date key formatted
yyyy-MM
, with an interval unit ofMONTHS
and an interval value of 1.I appreciate that this restriction is due to issues with replicating the exact implementation in Athena: https://github.com/trinodb/trino/blame/c3ee39a64df1923df41ac2c95e8ea3e10b45fe4b/plugin/trino-hive/src/main/java/io/trino/plugin/hive/projection/DateProjection.java#L66
Ideally this restriction would be lifted.