Documentation for @activate-if

vpax commented 1 year ago

Per the Discussion in https://github.com/zeek/zeek/discussions/2966 (which unfortunately didn't garner much in the way of feedback, save a single thumbs-up), I've implemented a new conditional directive, @activate-if. This is the documentation for it, which is hopefully sufficiently self-contained that I don't need to add anything here. I'll be putting in an associated implementation PR shortly.

This also fixes a micro-buglet elsewhere in the documentation.

vpax commented 1 year ago

Ping again to @ckreibich @rsmmr .

rsmmr commented 1 year ago

I like this feature but I share the concern of users not really grasping the distinction of two different @if-style constructs, and/or the trouble of teaching them the right one to use. And none of the names so far seem quite intuitive to me.

Hard to come up with alternatives, but here's a crazy thought: Could we change the syntax to use the style of the normal if statement? Like:

if ( running_a_cluster() ) {
        event zeek_init()
            {
        do_my_cluster_setup_stuff();
            }
}

Now we'd have both if and @if, but the former mimics what people are used to from "normal" code, and the latter remains a preprocessing thing marked by the @. Our guideline would be to prefer if wherever possible. It's not perfect, and I don't know if the parser could support this easily, but I believe it would align with the already existing restrictions on where @activate-if can be used (like not inside a function).

vpax commented 1 year ago

I don't think we can do it with a vanilla if, as that already has (different) semantics when seen at the global level. In addition, the scanner needs to recognize it, follow its nesting, and the like.

vpax commented 1 year ago

A few more thoughts on this. Given there's no way we want to break @if in a direct way, the options appear to be:

Come up for a name with @activate-if that is both clear regarding what it means, and a "natural" for script writers to gravitate to. This is hard to do (per the above comments), especially given how @if lines up with C++ conditionals both name-wise and semantics-wise.
One thought would be @IF. Here the casing is all that separates it from @if. Pluses: more eye-catching than any alternatives we've discussed (and more suggestive that there are two ways of thinking about if-conditionals). Minuses: potentially confusing, given the slim lexical distinction. Clunky-looking, especially when paired with existing lowercase @else and @endif, though I supposed we could introduce uppercase versions of them that only pair with uppercase @IF.
Deprecate @if, so that script writers/maintainers have to choose whether to migrate existing @ifs to whatever its new name is, or to @activate-if (or whatever we call it). Pluses: helps to ultimately prioritize @activate-if and remove confusion between it and @if. Puts on the radar of maintainers that they should give thought about which form of conditional they actually need (though they can do this just by searching their code for @if). Minuses: deprecations of commonly used features are always a headache, and this one may strike some maintainers as fairly gratuitous.

Does that cover the design range, or are there other basic approaches?

awelzel commented 1 year ago

Minuses: potentially confusing, given the slim lexical distinction. Clunky-looking, especially when paired with existing lowercase @else

The current relevant upper-case directives (@DIR, @FILENAME) yield a value, too, so @IF would be a bit of an odd-one out here.

If we go for "slim lexical distinctions", there's also postfix special characters that leave the interpretation a bit open ("+" for "better", "next", "adding both branches").

@if+ ( Cluster::enabled() )

Maybe@if<<, @if# or @if*, too. There could be @if- as an alias for @if.

There are a few more characters, but many end up being related to bool logic and probably should stay clear of those.

Still somewhat clunky, but thought worth showing.

rsmmr commented 1 year ago

Maybe@if<<, @if# or @if*, too. There could be @if- as an alias for @if.

One more idea: how about leaving the @ prefix to the current meaning (i.e., pre-processor style without any interpretation) and introducing a different prefix for our new semantics, like .if.

vpax commented 1 year ago

Neat, I think .if could work, if we also add .else and .endif so it's clear it's a completely different construct.

rsmmr commented 1 year ago

Neat, I think .if could work, if we also add .else and .endif so it's clear it's a completely different construct.

Exactly, they'd all use that then.

timwoj commented 1 year ago

I like using a different prefix character vs a new keyword, but I worry that . is a bit hard to notice when reading through code. I guess I'd have to see it in practice though.

ckreibich commented 1 year ago

I agree with the others that this is really subtle stuff, for the reasons already given. I have a couple of questions about the basics and some suggestions:

Vern, over in the discussion post you said:

-O gen-C++ currently has to skip functions (or in some cases whole files) that are influenced by @if conditionals, because if it compiles them in one context and they execute in another (where the @if would resolve differently), the results would be incorrect.

Could it theoretically compile for the situation at hand, and fall back to interpretation for the others?
Regarding the constraints in @activate-if's use, you say:

To ensure consistency, an @activate-if must not appear inside a function body, and cannot redef a record or enum type.

Why? What's different, consistency-wise, from a redef happening in an @if (which always feels adventurous, but legit)? I'm trying to understand if that's a concern mainly for script analysis, for script compilation, or for other reasons I'm not getting.
In your branch there remain 29 @if's in the scripts tree vs 40 @activate-ifs. For the remaining @ifs, did you conclude that they have to remain @if, or are the existing conversions merely a first batch? (Also, I really like how you're getting rid of a bunch of @if altogether, replacing with straight if!)
It sounds like code can make the decision whether @activate-if is feasible (because if not, how is the user going to find out that they cannot use it in a given setting?). So, could the implementations of @if and @activate-if become one, always trying the latter first and falling back to the former automatically?
If we know that there's only a known set of must-use cases for @if remaining, I am wondering if these could be covered by other new constructs. Perhaps an @zeek-version-range(...) specifically to replace @if (Version::at_least("3.2.0")), etc.

So, in essence: I am trying hard to find ways that allow @activate-if to become @if, because I think having both is never going to be nice. That said, if it really can't be helped, I conceptually like the .-prefix idea for constructs that focus explicitly on script optimization / ZAM — particularly if you might well have other constructs relating to these.

ckreibich commented 1 year ago

Another suggestion — add a pragma that switches @if into the always-analyzed mode:

@pragma push analyze-if
@if ...
@endif
@pragma pop analyze-if

Might work well particularly if @if and @activate-if cannot nest anyway, and could potentially apply transparently around @load-ed code.

timwoj commented 1 year ago

Another suggestion — add a pragma that switches @if into the always-analyzed mode:

Ooo. I think of all of the suggestions, I like this one the most.

vpax commented 1 year ago

Could it theoretically compile for the situation at hand, and fall back to interpretation for the others?

I don't see any way to do this. @ifs can alter the semantics of compiled code, with the simplest example being when the @if occurs inside the code itself (but there are other examples too, see below).

Regarding the restrictions on where the yet-to-be-named-@if-variant can appear: if it's inside a function, then either it should instead be a regular if or, if it can't, it's because taken/not-taken branches are semantically incompatible. If it redef's a record or enum, then the compiler will miscompute field offsets/enum values for these for the not-taken-branch code.

For the remaining @ifs, did you conclude that they have to remain @if

Yes. For example, what do you do about:

@if ( Cluster::is_enabled() )
@load ./cluster
@else
@load ./non-cluster
@endif

(By the way, this form accounts for the majority of the @ifs I didn't change.)

It sounds like code can make the decision whether @activate-if is feasible (because if not, how is the user going to find out that they cannot use it in a given setting?).

The code can't figure it out. The notion is that the user should consider "is this a bunch of definitions & event handlers that are stand-alone for either branch". Ideally, with time they instead are just using @new-type-of-if, get compile-time errors, and then think oh maybe this needs to be a scanner-level if.

If we know that there's only a known set of must-use cases for @if remaining, I am wondering if these could be covered by other new constructs

Yeah, and I'm game to go down that path, but it would be a breaking change that requires deprecation/inducing grumpiness. There are a couple of patterns used by the ones I didn't convert:

The @load-changes-based-on-cluster per my example above
Some need-a-fake-@if-to-force-call-to-BiF instances in places like init-bare.zeek that we could replace with something else (in fact, just initializing a global to the BiF value might do the trick).
Complex cluster logic inside a script that I found hard to confidently untangle. This is only in scripts/policy/frameworks/cluster/experimental.zeek.

Another suggestion — add a pragma that switches @if into the always-analyzed mode

Can't say I'm a fan of that. It changes the semantics of a language construct, which is not how I think about @pragma, and also adds quite a bit of verbiage to what we want to be the "usual" way of doing conditional code.

vpax commented 1 year ago

I think from the above there are two front-runner options:

Use alternative syntax like .if
Deprecate current @if, provide @load-if to take care of the main use case as mentioned. If we do this, we'll still need something like .if at least temporarily if we want the functionality available during the deprecation cycle.

I'd really like to try to resolve this ASAP so we have a shot of having this in 6.0 and thus starting the deprecation cycle, or so we can start migrating code to .if based on a stable release ...

rsmmr commented 1 year ago

The pragma approach is neat on the one hand, but on the other hand I'm also worried that it's too verbose/cumbersome for something that should arguably become the new default choice.

And the more I think about it, the less I'd want to change semantics of the existing @if at all; not even over time through a deprecation cycle. It's a mechanism for scripts to do "weird stuff" and changing how that works (incl. giving deprecation warnings) seems quite painful.

So my vote goes to .if at this point. That wouldn't preclude adding other new mechanisms as well to cover some current @if use cases better so that eventually, at least in the default scripts, we might end up not using that anymore.

vpax commented 1 year ago

One more thought on syntax. @timwoj flagged the concern that .if is not visually very distinct. This has me thinking about using *if or, perhaps (for reasons sketched below), if*. That stands out better. It has the slight problem that endif is not a reserved word, so *endif is ambiguous whether it's closing a *if or instead two separate tokens. Interpreting it as the former will catch any (presumably very rare) instances of the latter, and for them the fix of inserting a space after the * is an easy one.

Here's an approach that doesn't have that problem, and maybe is a bit cleaner too. We use

if* ( cond ) {
code-activated-if-true...
} else { 
code-activated-if-false...
}

where the else is optional. The advantage of this approach is that it's visually close enough to regular if that it's less surprising that the two branches must compile. The use of the * makes it distinct, as does (to a lesser degree) the use of {/} layout that differs from our usual. It has a minor disadvantage of making one-line conditionals a tad more verbose, but these aren't that common, at least in the base scripts.

vpax commented 1 year ago

To clarify the last suggestion above: the { and } delimiters would be required. Their layout, however, would be convention, not required.

ckreibich commented 1 year ago

Let me throw out one last suggestion: we could attribute the @if via something like:

@if (...) &analyze

or perhaps &compilable. I realize this goes against the idea of not changing @if semantics, which I sympathize with, though I'd argue the above makes that explicit — so I wanted to at least mention it.

What would we call the .if/if*/... constructs? They are not directives, assuming directives are anything starting with @. I'm getting the sense that we don't currently anticipate additional such constructs. If so, I think they are simply too odd to stand on their own. This has me going back to favoring @activate-if and the deprecation path. (Naming alternatives that come to mind for me are @checked-if, @parsed-if, @safe-if, none of which wow me.)

I think the key benefit of this isn't the parsing of code that won't run — no doubt handy, but not critical. It's script compilation. The crux with script compilation is that it currently has a user base of ~1 with (fine) documentation living in the Zeek sources, not the manual, yet we're changing the language to accommodate it. So I think the hesitation is warranted, and once this goes in, we should also commit to elevating script compilation from an experimental feature.

Don't fret about 6.0 Vern, we'll figure it out. The plan is to fork this upcoming Monday, but we can push back. There's other troubleshooting work underway still that we'd ideally resolve pre-fork too.

vpax commented 1 year ago

I could live with the somewhat clunky @analyze form you sketch (once we figure out the right name for the pseudo-attribute). The thought of just sprinkling an attribute on existing @ifs is fairly appealing. I also think that documentation-wise it'll be a bit easier to explain.

While I'm supportive of elevating script optimization (at least ZAM) beyond "experimental", that's not to my mind what motivates this change, in fact ZAM doesn't even benefit from this. Rather, it's enabling script code coverage to identify that there's conditional code that lacks test coverage. Getting bitten by that was what got me thinking down this path in the first place, and is the main important benefit. That -O gen-C++ potentially also benefits is nice too, but I'd want this even if it didn't.

ckreibich commented 1 year ago

it's enabling script code coverage

Ah right — you did point that out.

timwoj commented 1 year ago

I like the idea of the &analyze attribute. That way if we end up just fixing @if to do the right thing, we can remove support for it later without a lot of effort. That said, do we think this needs to land in 6.0? I don't mind holding the release for it, but I also think the deprecation cycle won't be too bad for it since the primary use is in the main code base.

vpax commented 1 year ago

Regarding landing in 6.0, it's not about deprecation (which isn't needed if we go with adding an attribute). It would be nice in that I think the test-coverage enhancement is quite valuable, and if it's not in 6.0 then users who only use LTS releases won't get the benefit until 7.0. That's why I'd like to get it in. But I can appreciate the counter-considerations.

rsmmr commented 1 year ago

Just for the record: I like the attribute approach as well, means we'll continue to have just one if-construct and it's definitely easier to explain.

timwoj commented 1 year ago

Closing this in favor of #190

zeek / zeek-docs

Documentation for @activate-if #181