Closed sfllaw closed 4 years ago
Thanks for the detailed feature request. I know one of the pain points of working with proto in CEL is the requirement to specify nested type names, e.g.
Msg{
nested: Nested{
field: "hello"
}
}
Given the type-inferencing capabilities of CEL one might (reasonably) expect the following to be supported:
Msg{
nested: { // no type name required!
field: "hello"
}
}
The above isn't supported in part due to collision between map and message literal syntax that has proven difficult to undo from a design perspective because of its potential to subtly change evaluation behavior. However, I'm curious to know if complex object construction is your use case, as we've toyed with the following notation:
Msg{
// magic '_' type name which gets resolved at check / runtime contextually.
// error if the type of 'nested' is polymorphic, e.g. protobuf.Any as it can't be inferred
// accurately.
nested: _{
field: "hello"
}
}
The above has it's own issues, but this is as far as I've gotten with respect to trying to simplify this use case.
@TristonianJones, thanks for the quick reply.
I’m not talking about type inferencing: I’m talking about cel.Container
and resolving identifiers:
Imagine we have a few protobuf enums: a.b.c.Color
and d.Direction
. Since cel.Container
can only take one name, we have to choose between cel.NewEnv(cel.Container("a.b.c"))
or cel.NewEnv(cel.Container("d")
if we want to simplify the namespaces for CEL authors.
Arguably, we’d want to write cel.NewEnv(cel.Container("a.b.c", "d"))
so that authors can simply write Color.RED
and Direction.NORTH
. I’d be happy to submit a pull request, if this approach is something you’d be likely to accept.
My concern with this feature is that it introduces a lot of complexity at runtime for parse-only expressions.
Within a single container it's possible more than one identifier could be matched, but with the namespace resolution rules it's clear which identifier wins. If there were multiple containers, it would lead to ambiguous precedence rules for identifier resolution (at least from a user perspective, especially for parse-only expressions).
Part of the reason I asked about object construction is because there are ways in which we could unambiguously resolve the proto type name with less user input. If you're doing lots of comparisons with proto enums, then the next best tactic I've seen is to create a container alias map:
{"A": "really.long.pkg.container.path.Enum",
"B": "other.really.long.pkg.container.path.Enum"}
From the language perspective, there is always at least one leading identifier before specifying an enum which makes resolution unambiguous, and the namespace resolution rules still apply. The trick is in manipulating the parsed output to replace the identifiers A
and B
with their fully qualified names in order to preserve the existing runtime and check semantics.
// Expanded to: msg.Status != really.long.pkg.container.path.Enum.ERROR
msg.Status != A.ERROR
It's also possible that using such an approach could generate strange error messages at check time which wouldn't relate exactly to user-entered source, but the tradeoff might be worth it.
My suggestion is to extend cel.Container
so that it also has a well-defined order. Given cel.Container("a.b.c", "d")
, a call to ResolveCandidateNames("Direction.NORTH")
should result in: a.b.c.Direction.NORTH
, a.b.Direction.NORTH
, a.Direction.NORTH
, d.Direction.NORTH
, Direction.NORTH
. I agree that any ambiguity would be disastrous.
I also note that this proposal preserves the fact that the last cel.Container
option wins, when passed to cel.NewEnv
. I don’t know if that’s what you meant by “multiple containers”, but changing this would make it impossible to replace the container, and would also be backwards incompatible.
We considered using cel.Globals
to implement an alias map but discovered that this is quite clunky. We found that doing this required reflecting across a lot of our protobuf definitions. Are you suggesting some other mechanism? It seems unfortunate that we can’t use the existing packages.Packager
machinery to do this.
Are you worried that people will add too many package names to packages.Packager
, which would slow down name resolution? We have hacked together a prototype multiPackage
implementation that we’ve monkeypatched in. This worked well, although we abandoned it in favor of working with you upstream.
@JimLarson thoughts? The proposed feature would augment the proto namespace resolution logic used in CEL today.
Container
option. Container
overrides more than one namespace during Environment extension.In the analysis, the Pros definitely outweigh the Cons with respect to the ease of use improvement, and I vastly prefer this to the package aliasing approach used in certain products. I think we'd just need to get serious about fixing the namespace resolution behavior in C++ before we consider enabling it for go. Maybe add a 'CompatMode' that checks for things that are not cross-language compatible if we choose to do it in Go first?
Curious to hear your thoughts.
@sfllaw I just discussed this with the CEL language council and we're generally in favor, but want to take some time to figure out a plan of attack as this could affect other evaluator implementations and we want to make sure we've covered all our bases. I should have an answer for you by May 1.
Thanks for the FR and for your patience. :)
Let's call the feature "container paths", analogous to the shell's search path for binaries, compiler library paths, etc.
We already have container paths, in a limited way, in that specifying container "a.b.c" gives us a search path of "a.b.c", "a.b", and "a". If we allow the container path of "a.b.c:x.y.z" (to borrow the shell notation), then we need to decide on the relative order of the expansions. The simplest would be a.b.c, a.b, a, x.y.z, x.y, x. However in the case where we specify "a.b.c:a.d.e", and we have messages "a.Foo" and "a.d.e.Foo", it might be surprising that "a.Foo" gets chosen.
Perhaps with container paths available we wouldn't need the implicit paths of shortened prefixes - you just list them explicitly as you desire them to avoid surprises.
I'm not too concerned about the added parse-time overhead - those who are sensitive to the performance delta should be using the checker anyhow.
Package aliasing vs search path: I agree that aliasing has some benefits for ease of implementation. But I can see cases where I'd really want a "using" directive so that I can write "Foo" instead of "a.Foo".
Question: do containers and container paths affect qualified function name resolution? Should they? Since it's a different namespace (in the Lisp-1 vs Lisp-2 sense), it's not clear that it would be used the same way. Of course, you can say the same about variable names vs proto message names.
We already have container paths, in a limited way, in that specifying container "a.b.c" gives us a search path of "a.b.c", "a.b", and "a". If we allow the container path of "a.b.c:x.y.z" (to borrow the shell notation), then we need to decide on the relative order of the expansions. The simplest would be a.b.c, a.b, a, x.y.z, x.y, x. However in the case where we specify "a.b.c:a.d.e", and we have messages "a.Foo" and "a.d.e.Foo", it might be surprising that "a.Foo" gets chosen.
The two alternatives are:
The depth-first search that I proposed is probably easiest to explain. The breadth-first search is probably more intuitive. But this seems like it would be a matter of perspective: some people will get confused either way.
Perhaps with container paths available we wouldn't need the implicit paths of shortened prefixes - you just list them explicitly as you desire them to avoid surprises.
This suggestion is an interesting one. It would mean that cel.Container("a.b.c")
would actually return cel.ContainerPath("a.b.c", "a.b", "a")
to preserve backwards compatibility.
Question: do containers and container paths affect qualified function name resolution? Should they? Since it's a different namespace (in the Lisp-1 vs Lisp-2 sense), it's not clear that it would be used the same way. Of course, you can say the same about variable names vs proto message names.
It appears that containers already affect function resolution already. If we implement this using the packages.Packager
interface, the existing behavior will be preserved and the container path behavior will be an obvious extension: https://github.com/google/cel-go/blob/c4c3df541e541a8cc2fa0b5760fc88cf29f5e58d/checker/env.go#L129-L138
@JimLarson I'm not really sure how I feel about ContainerPath
since it could be used to subvert the typical understanding of a namespace and make programs harder to reason about.
Granted, if we were to try to remain compatible with C++ namespace resolution, we would permit a using
directive:
env, err := cel.NewEnv(
cel.Container('co.acme.apps.hr',
// Everything after the first container is a 'using' directive.
// Alternative, make a cel.Using EnvOption.
'co.example.acquisition.salaries',
'org.math.stats.avg'),
),
// Compiles to:
// org.math.stats.avg(
// co.acme.apps.hr.readAll(
// co.example.acquisition.salaries.Query{title: "manager"}))
env.Compile(`avg(readAll(salaries.Query{title:"manager"}))`)
In the above example, Container
resolution still behaves as before, but avg
and salaries
have now been added to the co.acme.apps
namespace. This mirrors how C++ namespacing works, but without all of the complex rules around using
directive scoping. Following in the foosteps of C++ would add more complexity in that the 'using' style declarations would also need to support resolution of the following qualified names to their original location:
// Must compile to the same result as the previous example.
co.acme.apps.hr.avg(readAll(co.acme.apps.hr.salaries.Query{title: "manager"}))
Pro:
Con:
My suggestion is to extend cel.Container so that it also has a well-defined order. Given cel.Container("a.b.c", "d"), a call to ResolveCandidateNames("Direction.NORTH") should result in: a.b.c.Direction.NORTH, a.b.Direction.NORTH, a.Direction.NORTH, d.Direction.NORTH, Direction.NORTH
With C++ semantics, the original search space would be preserved and the symbol Direction
would not be found unless specifically added as a using
directive: cel.Container('a.b.c', 'd.Direction')
Thoughts?
In the above example,
Container
resolution still behaves as before, butavg
andsalaries
have now been added to theco.acme.apps
namespace. This mirrors how C++ namespacing works, but without all of the complex rules aroundusing
directive scoping. Following in the foosteps of C++ would add more complexity in that the 'using' style declarations would also need to support resolution of the following qualified names to their original location:// Must compile to the same result as the previous example. co.acme.apps.hr.avg(readAll(co.acme.apps.hr.salaries.Query{title: "manager"}))
Injecting new symbols into an existing namespace seems like it would be prone to error. It adds org.math.stats.avg
as an alias to co.acme.apps.hr.avg
, which is not how I would expect this to work. Also, I cannot think of a reasonable way to deal with a namespace conflict if co.acme.apps.hr
actually added an avg
implementation.
@sfllaw my framing here is somewhat as a devil's advocate. In practice, we shouldn't be innovating where there are existing standards, especially since CEL's already tried to align with C++ here so I've tried to reason in the manner most consistent with the prior art CEL followed.
Namespaces are tricky. I don't really want to say "CEL containers are like C++ namespaces, except when they're not". Because all identifiers for variables, types, and functions are declared explicitly, it should be trivial to detect and raise collisions. @JimLarson @sfllaw thoughts?
In practice, we shouldn't be innovating where there are existing standards, especially since CEL's already tried to align with C++ here so I've tried to reason in the manner most consistent with the prior art CEL followed.
I agree with this in principle, but in this instance, I don’t think that containers look like C++ namespaces with using
because the following CEL:
// cel.NewEnv(
// cel.Container("co.acme.apps.hr",
// "co.example.acquisition.salaries",
// "org.math.stats.avg"))
hr.avg(hr.readAll(hr.salaries.Query{title: "manager"}))
does not translate to:
using co::acme::apps::hr {
using co::example::acquisition::salaries;
using org::math::stats::avg;
hr::salaries::Query query;
query.set_title("manager");
hr::avg(hr::readAll(query));
}
since C++ won’t walk up the parent namespaces, while CEL does. I think it’s more important that CEL is consistent with itself, over being consistent with C++. It would be weird if the container did name resolution but a cel.Using
directive did not.
Namespaces are tricky. I don't really want to say "CEL containers are like C++ namespaces, except when they're not".
I suspect that “CEL containers are like C++ namespaces” is inaccurate, even with the current spec? @TristonianJones, are you arguing that CEL changes so that it this statement is true? In which case, this might be the backwards-compatible way to do this, although the name resolution behavior will still be weird.
func Container(pkg string) EnvOption {
return func(e *Env) (*Env, error) {
e.pkg = packages.NewPackage(pkg)
fields := strings.Split(pkg, ".")
for i := len(fields) - 2; i != 0; i-- {
Using(strings.Join(fields[:i], "."), nil)
return nil
}
}
func Using(name, alias string) EnvOption {
return func(e *Env) (*Env, error) {
e.pkg.Using(name, alias)
return nil
}
}
func (p defaultPackage) Using(name, alias string) {
if alias == nil {
alias = name[strings.LastIndex(name, "."):]
}
p.using[alias] = name
}
Hi @sfllaw, the cel.Container
is specified to match and implemented in terms of C++ namespace resolution rules. I was suggesting that as an alternative to having multiple containers, we could consider introducing a using
directive as it would solve similar usability concerns and has shared design ancestry with CEL containers.
My expectation is that the following example ...
// cel.NewEnv(
// cel.Container("co.acme.hr",
// // using directives
// "co.example.acquisition.salaries",
// "org.math.stats.avg"))
hr.avg(hr.readAll(hr.salaries.Query{title: "manager"}))
... Maps to C++ that looks like this:
namespace co {
namespace acme {
namespace hr {
using co::example::acquisition::salaries;
using org::math::stats::avg;
salaries::Query query;
query.set_title("manager");
avg(readAll(query));
} // namepace hr
} // namespace acme
} // namespace co
In C++, the variable resolution would resolve the following paths for avg
:
co::acme::hr::avg
co::acme::avg
co::avg
::avg
Since co::acme::hr::avg
is an alias to org::math::stats::avg
, the type-checker should emit the non-aliased name. The fun part is when functions have different overloads, say org.math.stats.avg(list(int))
is defined and co.acme.avg(vector(double))
is defined, it's totally possible to refer to each overload with the using
style directive, so we'd have to consider how this impacts overloading ... same is true for multiple namespaces.
As a counterexample, it is possible in CEL to make a container that resolves names deeper than a C++ namespace? For example, given the following protobuf definition:
package d;
message Compass {
Direction direction = 1;
enum Direction {
UNKNOWN = 0;
NORTH = 1;
EAST = 2;
SOUTH = 3;
WEST = 4;
}
}
we can ask the container to resolve names within the enum:
// cel.NewEnv(
// cel.Container("d.Compass.Direction"),
// cel.Types(&pb.Compass{}),
// )
Compass{direction: NORTH}
You might suggest that this is a bad way to use a container, but this is the current behavior. Given this is the case, does it really make sense to inject using
directives into the enum’s “namespace”? Do enums even have a namespace? :thinking:
The mental model that I’ve been using for cel.Container
is that it expands out into the following Python, if Python let you import classes and their members:
import d
import d.Compass
import d.Compass.Direction
@sfllaw Hopefully, you feel like you're getting a deep look at how decisions are debated for CEL and appreciate that you've raised an interesting idea that we really want to pursue. It's raised some interesting points about how containers are spec'd even now:
Does it make sense to use an enum
in a using
declaration. Sure! Especially if you're trying to provide an upgrade path to use a new enum since proto supports an open enum set. Does it make sense to permit enum names as containers? No, but it probably doesn't really hurt much either. The feature is intended to improve the ease of use for nested proto messages, since nested messages take on the outer proto message name as part of their namespace. e.g.
env, _ := cel.NewEnv(cel.Container(`google.rpc.context.AttributeContext`))
// Resource is a nested message in `google.rpc.context.AttributeContext`
env.Compile(`Resource{name: 'test'}`)
At any rate, container names are more limited since it's difficult to tell a priori whether the container refers to an enum. And, if it does, the worst case is that a single enum value is accessible by its simple name. I feel like this is a problem which developers will self-solve which is why we haven't added more restrictions on its use.
PR #341 ensures that evaluation agrees with the existing behavior of the type-checker: first namespaced function found wins. It's a necessary first step, but I don't think it's sufficient, since a proper fix would also update the spec to discuss function overloading and namespace resolution (either the spec is wrong, or the impl, or both).
What I like about using
directives is that it preserves a single hierarchy walk up the container when searching for identifiers. If we fix the function behavior to support namespaced overloads as part of a function definition, then it means the search cost for the overload set is limited to the number of components c
in a qualified container name as opposed to m
hierarchy walks for m
containers.
What I like about using directives is that it preserves a single hierarchy walk up the container when searching for identifiers. If we fix the function behavior to support namespaced overloads as part of a function definition, then it means the search cost for the overload set is limited to the number of components c in a qualified container name as opposed to m hierarchy walks for m containers.
The performance characteristic of your proposal is unquestionably good. I also see the advantages for the implementation because “worse is better”.
I think my fundamental objection is that the using
directive adds a new name to an existing namespace, which makes things less clear for users: if someone reads to hr.salaries.Query
or co.acme.hr.salaries.Query
in a rule program, they won’t find it in the protobuf definition when they look at package co.acme.hr
because that symbol has been injected into the container’s namespace. In addition, the Env
is locked into this namespace forever.
Correct me if I’m wrong, but currently there is nothing that injects symbols into a container: cel.Declarations
, cel.Macros
, and cel.Types
all add symbols into the global namespace. I think this property is good to preserve, because it avoids the problem above.
Perhaps we should circle back to the alias map idea? If first-class aliases were rooted in the global namespace, then resolving salaries.Query
wouldn’t short-circuit as quickly as your proposal, but it would be exactly as expensive as resolving co.example.acquisition.salaries.Query
:
env, err := cel.NewEnv(
cel.Types(&spb.Query),
cel.Declarations(
decls.NewFunction("org.math.stats.average", ...),
// Automatically derive the alias name:
decls.NewAlias("co.example.acquisition.salaries")
// Explicitly name the alias:
decls.NewAliasAs("avg", "org.math.stats.average"),
// Probably a misfeature: from org.math.stats import *
// decls.NewAliasAll("org.math.stats"),
),
cel.Container("co.acme.hr"),
)
// Compiles to:
// org.math.stats.average(
// co.acme.hr.readAll(
// co.example.acquisition.salaries.Query{title: "manager"}))
env.Compile(`avg(readAll(salaries.Query{title:"manager"}))`)
// Parse error: hr.avg and hr.salaries not found
env.Compile(`hr.avg(hr.readAll(hr.salaries.Query{title:"manager"}))`)
Under this proposal, aliases would be shadowed by cel.Globals
and cel.Program.Eval
, matching the current behaviour.
@sfllaw I'm still pondering the tradeoffs here. At this point I'll probably start creating a public Google Doc to capture the options.
I've had to learn more C++ than I've ever cared to in order to catch up on this conversation, but maybe that will pay off.
What I'm not hearing is a strong request for following two hierarchy paths at once: if "a.b.c" is the "primary" container, and we also want fast access to stuff under "x.y.z", we don't necessarily have to add "x.y" and "x" automatically. It's okay to have to add them explicitly when needed.
I like the idea of keeping the container an arbitrary string of dot-separated identifiers. CEL should only require declarations for what's being actually used. We should be able to specify a container "a.b.might.exist" and declare only the types in "a.b" if those are the only things we want to reference at the moment.
The C++ name resolution rules are quite complex, but it seems like certain using
statements might throw errors based on name collisions. Despite Tristan's hesitation at partial compliance with a standard, I'd much prefer it if we tolerated collisions and resolved them through shadowing, rather than producing errors, so that we can be tolerant to the evolution of proto files.
So if we have the current Container("a.b.c")
option and add a Using("x.y.z"...)
option, then the name resolution can work by the path:
Using
namespace, in order.This way, Container
and Using
can be used separately or together.
@JimLarson, I would personally avoid Using
unless CEL wants to draw an analogy to the C++ using
directive, with all of its implications. Personally, I find that C++ namespaces are subtle compared to other approaches.
Also, should Using
even be an Env
level construct? Since I couldn't distinguish them from variables or functions, I suggested that we make them a NewAlias
in Env.declarations
, which would work much like NewIdent
. I suppose the only wrinkle is that I don't understand how aliasing a type would work, which you might want to do if a type gets renamed and you need to provide backwards compatibility.
Aliases also imply cycle-checking, so this is a disadvantage to this approach.
@sfllaw @JimLarson I think I'd summarize the options under consideration as:
import
: external fully-qualified name is made available by its simple name in the container, conflicts with existing names in the container would be raised as an error.using
: external fully-qualified name is copied into the global namespace, possibly overwriting or augmenting an existing global name.I need to organize the threads here into a standalone doc, but for as varied as the options appears, I think we're pretty close to agreeing on the desired scope and application of what we choose to introduce. I will try to follow up later tonight with a document link.
@TristonianJones I would point out that the import
could also import the symbol into the global namespace, which would be accessible via .avg()
. This would avoid the need to error on conflict.
Hi @sfllaw,
After discussing the feature more with the rest of the language council, we're leaning toward aliasing as a post-parse step as a way of capturing the narrowest set of use cases supported by import
and using
without implying the broader functionalities of either. I've got an action item to revise a doc I wrote to this effect and I'll share the revised proposal with you later this week.
Thanks for your patience and for spurring such a wonderful conversation here.
Regards,
-Tristan
I've shared the discussion doc with the cel-go-discuss@googlegroups.com list. If anyone is interested in carrying on the conversation, all members of the group have comment privileges: https://docs.google.com/document/d/1F8BuG4SNFUXYPHi7nXaQmDxUsWCRDr1PfF789KkcxMY/edit#heading=h.z94vuodaszr
Feature request checklist
Change
Currently,
cel.Container
can be used to specify a namespace for symbol resolution. For example:In order to write programs that reference multiple protobuf packages, we are forced to use fully qualified names for types outside the container:
Example
We should provide the ability to resolve multiple namespaces. There are two options:
Container
could take multiple package names:Env
could be initialized with a custompackages.Packager
:Alternatives considered
It’s possible that both these methods would be useful. Here is a sample implementation: