intake / akimbo

For when your data won't fit in your dataframe
https://akimbo.readthedocs.io
BSD 3-Clause "New" or "Revised" License
21 stars 6 forks source link

Support for `pandas.DataFrame.eval()`? #42

Open gipert opened 6 months ago

gipert commented 6 months ago

I've just ran a quick test and it seems to me that pandas.DataFrame.eval() is not supported. Is that correct? Is there a way to evaluate string expressions on dataframes containing Awkward arrays?

jpivarski commented 6 months ago

Is it an error message saying that it's not supported?

If Pandas is just running Python eval with column names loaded into the namespace (as the documentation suggests), then I don't see why those strings couldn't operate directly on Awkward Arrays.

Maybe the AwkwardSeries objects need to be unwrapped before passing to eval and the result needs to be re-wrapped? (This is a question for @douglasdavis.)

gipert commented 6 months ago

I don't see why those strings couldn't operate directly on Awkward Arrays.

Indeed that was my thinking. This is what happens:


>>> import awkward_pandas as akpd
>>> import pandas as pd
>>> import awkward as ak
>>> df = pd.DataFrame(
    ...:   {
    ...:     "a": [1, 2, 3, 4],
    ...:     "b": akpd.from_awkward(ak.Array([[1, 2], [], [3], [4, 5, 6]]))
    ...:   }
    ...: )
>>> df.eval("b*2")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 df.eval("b*2")

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/frame.py:4566, in DataFrame.eval(self, expr, inplace, **kwargs)
   4563     kwargs["target"] = self
   4564 kwargs["resolvers"] = tuple(kwargs.get("resolvers", ())) + resolvers
-> 4566 return _eval(expr, inplace=inplace, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/eval.py:336, in eval(expr, parser, engine, local_dict, global_dict, resolvers, level, target, inplace)
    327 # get our (possibly passed-in) scope
    328 env = ensure_scope(
    329     level + 1,
    330     global_dict=global_dict,
   (...)
    333     target=target,
    334 )
--> 336 parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
    338 if engine == "numexpr" and (
    339     is_extension_array_dtype(parsed_expr.terms.return_type)
    340     or getattr(parsed_expr.terms, "operand_types", None) is not None
   (...)
    344     )
    345 ):
    346     warnings.warn(
    347         "Engine has switched to 'python' because numexpr does not support "
    348         "extension array dtypes. Please set your engine to python manually.",
    349         RuntimeWarning,
    350         stacklevel=find_stack_level(),
    351     )

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:809, in Expr.__init__(self, expr, engine, parser, env, level)
    807 self.parser = parser
    808 self._visitor = PARSERS[parser](self.env, self.engine, self.parser)
--> 809 self.terms = self.parse()

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:828, in Expr.parse(self)
    824 def parse(self):
    825     """
    826     Parse an expression.
    827     """
--> 828     return self._visitor.visit(self.expr)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:415, in BaseExprVisitor.visit(self, node, **kwargs)
    413 method = f"visit_{type(node).__name__}"
    414 visitor = getattr(self, method)
--> 415 return visitor(node, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:421, in BaseExprVisitor.visit_Module(self, node, **kwargs)
    419     raise SyntaxError("only a single expression is allowed")
    420 expr = node.body[0]
--> 421 return self.visit(expr, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:415, in BaseExprVisitor.visit(self, node, **kwargs)
    413 method = f"visit_{type(node).__name__}"
    414 visitor = getattr(self, method)
--> 415 return visitor(node, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:424, in BaseExprVisitor.visit_Expr(self, node, **kwargs)
    423 def visit_Expr(self, node, **kwargs):
--> 424     return self.visit(node.value, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:415, in BaseExprVisitor.visit(self, node, **kwargs)
    413 method = f"visit_{type(node).__name__}"
    414 visitor = getattr(self, method)
--> 415 return visitor(node, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:537, in BaseExprVisitor.visit_BinOp(self, node, **kwargs)
    535 op, op_class, left, right = self._maybe_transform_eq_ne(node)
    536 left, right = self._maybe_downcast_constants(left, right)
--> 537 return self._maybe_evaluate_binop(op, op_class, left, right)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:507, in BaseExprVisitor._maybe_evaluate_binop(self, op, op_class, lhs, rhs, eval_in_python, maybe_eval_in_python)
    504 res = op(lhs, rhs)
    506 if res.has_invalid_return_type:
--> 507     raise TypeError(
    508         f"unsupported operand type(s) for {res.op}: "
    509         f"'{lhs.type}' and '{rhs.type}'"
    510     )
    512 if self.engine != "pytables" and (
    513     res.op in CMP_OPS_SYMS
    514     and getattr(lhs, "is_datetime", False)
   (...)
    517     # all date ops must be done in python bc numexpr doesn't work
    518     # well with NaT
    519     return self._maybe_eval(res, self.binary_ops)

TypeError: unsupported operand type(s) for *: 'awkward' and '<class 'int'>'
gipert commented 6 months ago

Just for the context: I'm writing some code to evaluate algebraic expressions from a config file on tables made by jagged and rectangular columns. I was hoping to be able to write almost no code by using pandas.eval()...