Open yuvi17 opened 1 year ago
A query built using sqlalchemy + pyathena something like this
query = select().where(
func.cardinality(func.filter(
"action",
lambda x: (
func.cast(x['timestamp'], Date) <= '2020-09-07',
func.cast(x['timestamp'], Date) >= '2020-06-01',
func.cast(x['action_count'], Integer) > 3
)
))
)
results in a UnsupportedCompilationError
with exceptions
operator_ = override_operator or binary.operator
disp = self._get_operator_dispatch(operator_, "binary", None)
if disp:
return disp(binary, operator_, **kw)
else:
try:
> opstring = OPERATORS[operator_]
E KeyError: <built-in function getitem>
../../../opt/anaconda3/envs/py39/lib/python3.9/site-packages/sqlalchemy/sql/compiler.py:2201: KeyError
The above exception was the direct cause of the following exception:
.....
.....
.....
elif replace_context is not None:
# no good solution here, we would like to have the exception
# have only the context of replace_context.__context__ so that the
# intermediary exception does not change, but we can't figure
# that out.
exception.__cause__ = replace_context
try:
> raise exception
E sqlalchemy.exc.UnsupportedCompilationError: Compiler <pyathena.sqlalchemy_athena.AthenaStatementCompiler object at 0x7f98cba92190> can't render element of type <built-in function getitem> (Background on this error at: https://sqlalche.me/e/14/l7de)
../../../opt/anaconda3/envs/py39/lib/python3.9/site-packages/sqlalchemy/util/compat.py:208: UnsupportedCompilationError
This is after creating an engine with athena
and passing the dialect to compile
function of sqlalchemy.
If there is anything missing from the implementation in PyAthena, pull requests are welcome.
Athena has support for
filter
function that can be used to filter values in a array and return the filtered array. https://docs.aws.amazon.com/athena/latest/ug/filtering-arrays.htmlIt will be quite helpful if we can have an equivalent function in pyathena's SQLAlchemy dialect.
Use Case
I have a table that contains an array of timestamps and an action performed by an actor. Something like
I would want to write a query which can give me something like, count of times this action was performed more than 3 times b/w certain dates. Or users who performed this action a total of more than 3 times b/w certain dates.
If had to write Athena SQL for this, it would look something like this
I would like to do something like this for a table with multiple such columns and using
and
or
etc. between those.@laughingman7743 Is there any way I can do this today?
I hate the idea of writing the query as Python F-strings and filling in the values.