Closed masonproffitt closed 1 year ago
Just to add that I also get this warning (before the invalid ast.Constant
node raises an error) for something like np.cos
:
WARNING func_adl.type_based_replacement:type_based_replacement.py:688 Method cos not found on object <class 'module'>
I guess func_adl
is trying to find the method in the module
class rather than in the actual module (numpy
)?
If you ask for a module that is not known in ServiceX, it should error out during code-generation and that should get the error back to the user.
Hmmm... In python 3.7 I can see the error being raised in the else
statement as you point out, @masonproffitt . However, if 3.8 on, I do not get an error. Though I would imagine the qastle
is probably bad. Above here it seems to say that you would expect an exception thrown even if you are using 3.10 or 3.11. In short, I'm worried I am not reproducing your error. Could you take a look at the PR that is linked here and look at the added tests and tell me if they are covering this? But work in 3.10, 11, and 12.
Second question. What is the proper behavior when you see "np.cos(xxx)"? A name reference w.r.t. "np" or "numpy"? If the user has imported numpy as np, then we can see that. The reason I ask is I would think we'd not want to depend on something weird the user has done "import numpy as nnp" (or something). So when we have the import, perhaps we should do "numpy".
@masonproffitt
Right, the error I get is actually from when I try to deepcopy
the AST; I get TypeError: cannot pickle 'module' object
. But the real problem is that the Python AST is not valid (a non-literal, non-ast
object like a module
should never be inside a Python AST).
As for the name, it should pass whatever name is used in the lambda
(so np
in the np.cos
example). Basically it should be oblivious to what has been imported or not, because we have no way of actually passing a module to the backend (maybe that could be a future project, but that's major work).
Ok - got it: A few things:
Select
statements and the like. I'll update the test to try to make it break. Thanks!numpy
imported as np
and awkward
as ak
(I assume). This means the internal details of the transformer are now dictating how the analyzer writes the code. Instead, I think the code that looks at the constant should note that it is a module (you can do that, I saw it in the debugger) and then substitute with a transformer approved module name. We could have a mapping or we could have the full name, no matter.@masonproffitt - Ok - I've updated the PR to just transmit the name. Please take a look. But, please also let me know what you think about the above proposal.
Instead, I think the code that looks at the constant should note that it is a module (you can do that, I saw it in the debugger) and then substitute with a transformer approved module name. We could have a mapping or we could have the full name, no matter.
I think this is more than we should really do. I don't want to give the impression that arbitrary numpy
or awkward
functions are allowed in queries--many if not most will result in very confusing errors or completely wrong results. The intention is that there's a very limited set of functions that should ever appear (only allowing the math
module might actually be less confusing, but I figured people are probably most used to using np
or ak
versions of math functions). I view something like np.cos
as one whole reserved keyword. I believe this is similar to what func_adl_xAOD
does with things like sqrt
.
Basically, I think the right thing to do for now is just to document what functions are allowed. This already done in func_adl_xAOD
's README; I can do something similar for func_adl_uproot
.
Ok - so leave the modification as it is now, then (modulo your comment, which I think until we have a real example, we might ignore).
SHould we white-list things? That can be a separate bug/feature.
These lines don't check that a variable actually represents a constant before replacing a
Name
node with aConstant
node:https://github.com/iris-hep/func_adl/blob/89713708d8d1dc7234a38d78a6961170f4d26dca/func_adl/util_ast.py#L12-L15
For Python versions before 3.8, some type checking is done:
https://github.com/iris-hep/func_adl/blob/89713708d8d1dc7234a38d78a6961170f4d26dca/func_adl/util_ast.py#L17-L29
but in either case, this behavior causes issues here:
https://github.com/iris-hep/func_adl/blob/89713708d8d1dc7234a38d78a6961170f4d26dca/func_adl/util_ast.py#L335-L336
when
v
does not represent a literal. This causes problems for the Uproot backend, where I allow references to modules likenp
andak
. For example, I might use something likeds.Select(lambda event: np.cos(event.MET_phi))
. That works fine as long as you haven't imported the relevant module in the source where the lambda is (since there is no potential capture), but it breaks if that module has been imported. If the surrounding code needs that module, the only solution is to go back to passing the lambda as a string.I would argue that the correct behavior here is for the unmodified
Name
node to be passed through if the captured variable is not a constant (maybe with a warning or even just a debug log message). That's consistent with what happens if the name hasn't been defined outside of the lambda. This leaves it up to the particular code generator how to handle any apparently undefinedName
, which basically allows each backend to set its own reserved keywords.