Open ckreibich opened 3 years ago
I went through our standard package source and currently a bit under half of the Zeek packages with plugins in the standard package source have these problems. (For the Amazon ones I've just submitted PRs to get these fixed, but it's not a sustainable approach...) It affects our own too --
zeek/spicy-tftp
has this problem, for example, so it's clearly something that's easily overlooked.
That's interesting to find that many have potential issues, thanks for looking. Any guesses/commonalities about how they came to be that way?
I recall the initial init-plugin
for some time did not output any zkg.meta
(bro-pkg.meta
), that may have been around 2.6, and the first plugin-based packages I created even suffered this pitfall and I learned the hard way. An interesting thing to know could be how many of the existing packages with this issue were created before init-plugin
offered a guiding zkg.meta
? If it's a high number, it at least helps reduce worry that it's a rampant mistake people make and more of a historical relic that might phase-out/minimize over time even if we did nothing further.
- Alter the script layout in the package template so the collision is less likely. For example, you can move all of the plugin functionality into its own directory, so you end up with this kind of tree that came up over in #93:
This seems like a "simplest thing that works" for dealing with original confusion that likely led to picking wrong script_dir
paths: just put the right scripts/
dir that people need to use at the top-level, making it the obvious choise. Maybe you'll still end up installing (and tar
ing) things into a similar layout as before though? i.e. since people can use these plugins standalone and technically a load like @load Corelight/CommunityID
works that way, then that should still work after any changes we do?
Also means there could still be the double-loading problem when installing via zkg
and trying to use both @load Corelight/CommunityID
and @load packages
/@load zeek-community-id
since there's two separate copies of those scripts. For that, I think zkg
may be able to detect when script_dir
is nested within plugin_dir
and, if it is, then instead of creating a copy when installing script_dir
, it makes a symlink that points into the location where plugin_dir
gets installed.
Doing both those seems like relatively straightforward approach that should address the problem.
Any guesses/commonalities about how they came to be that way?
It's hard to generalize, but from digging through git commit histories it usually looks like the packages didn't start out from the init-plugin
template. Some started out as "pure" plugins and had the zkg packaging added later.
This seems like a "simplest thing that works" for dealing with original confusion
Cool.
Maybe you'll still end up installing (and taring) things into a similar layout as before though?
I couldn't quite follow you here — do you mean that despite the adjusted directory structure, people might still (accidentally) put a tree in place that could cause loads like @load Corelight/CommunityID
to work even though they shouldn't?
I think zkg may be able to detect when script_dir is nested [...]
That's nice. It'd restore Zeek's existing ability to detect that a script has already been sourced, which is great.
Two related thoughts here, while we're on it:
I do wonder whether the plugin-level script need to be as open-ended and flexible as we currently make them. For example, do there really need to be scripts in there that the plugin manager's entry points don't cover? Do any of those scripts/folders need to become part of the script search path? I've not seen examples yet of packages that actually require this. Not critical — I'm just wondering whether we can simplify this a bit.
Do you think the existing namespaced load-paths (like Corelight/CommunityID
) are worth supporting? I couldn't tell from the code whether they were more or less a side-effect of the nested folder structure, or a design goal. Personally, I really like that people can load packages via their package name (e.g. @load zeek-community-id
), i.e. via zkg
's folder management as opposed to the plugin's.
Maybe you'll still end up installing (and taring) things into a similar layout as before though?
I couldn't quite follow you here — do you mean that despite the adjusted directory structure, people might still (accidentally) put a tree in place that could cause loads like
@load Corelight/CommunityID
to work even though they shouldn't?
It was mostly unfinished thinking about expectations around @load Corelight/CommunityID
like you mentioned:
Do you think the existing namespaced load-paths (like
Corelight/CommunityID
) are worth supporting? I couldn't tell from the code whether they were more or less a side-effect of the nested folder structure, or a design goal. Personally, I really like that people can load packages via their package name (e.g.@load zeek-community-id
), i.e. viazkg
's folder management as opposed to the plugin's.
Here's some questions/reasoning I go through:
Don't think so since whatever change is made doesn't impact how existing plugins work and easy enough to explain how new plugins created from latest init-plugin
start to work.
@load Corelight/CommunityID
an explicit requirement/design ?I don't remember anything specific, maybe Robin has something. That scheme could help reduce chance of naming conflict/collision, but ultimately seems arbitrary and not something that has to be standardized -- hypothetically, if a user had ability to better customize installation-path of the scripts, it's maybe more useful if they were free to choose for themselves how to refer to a plugin's scripts in @load
s rather than restrict it.
zkg
) end up installing the scripts ?I didn't understand this yet: the standalone plugin installation will have to install the pure-scripts somewhere and zkg
will still rely on that for installation of plugin-specific parts, but also additionally still do its own separate install of the pure-script parts even though the standalone install logic does that, too?
So when using zkg
, if there's two distinct copies of the pure-scripts that get installed, is the "double-loading" problem still a concern or what's done to minimize it?
I do wonder whether the plugin-level script need to be as open-ended and flexible as we currently make them. For example, do there really need to be scripts in there that the plugin manager's entry points don't cover? Do any of those scripts/folders need to become part of the script search path? I've not seen examples yet of packages that actually require this. Not critical — I'm just wondering whether we can simplify this a bit.
I'm not sure, but if changes there are not directly related/needed to address the particular problems in the original discussion, it may be easier to leave that for separate treatment.
No strong opinions here (other than that it'll be good to address this!), just a couple of notes:
I'd much prefer a solution that doesn't change Zeek's semantics of plugin loading; not because the current scheme is the best way to do it (probably not), but because I'd like to avoid ending up with different semantics for different Zeek versions. That line of thinking also suggests something similar for zkg: if we can address thist hrough better templating/guidance/warnings, rather than changing how zkg operates, that would be preferable.
Re/ the namespaced loading (@load Corelight/CommunityID
) I believe the original motivation was indeed to avoid conflicts. I'd say retaining some kind of name spacing is valuable here. For example, what if I wanted to fork the community ID package? If I put that into rsmmr/zeek-community-id
, but keep loading it through zeek-community-id
, that would be confusing. Maybe a way in between would be to keep the namespace but combine it with the local part of the package name: @load corelight/zeek-community-id
. Could also be something to discuss in the broader context of overhauling Zeek 'sscript/package namespacing.
Great feedback folks, thank you. Sorry for taking a bit to follow up here.
To where does a default, standalone plugin installation (without zkg) end up installing the scripts ?
Yeah, my thinking here was as you say Jon — nothing needs to change for pure plugins because they don't have a problem. I think this is purely about finding something acceptable that avoids the dual-loads for zkg
packages. This then also pretty naturally leads to not messing with plugin semantics, since they don't need to change. Agreed.
So it sounds to me like the path forward can be as you proposed, @jsiwek: combine plugin-tree relocation in the package source with nesting detection to enable symlinks.
I think this leaves your point about installing multiple versions, @rsmmr. This gets a bit hairy for me. A few thoughts:
zkg
) — you need to figure things out yourself if you want multiple installations (via PYTHONPATH, virtualenvs, etc)zkg
currently has a notion of namespacing via Package.name_with_source_directory()
, which gives you e.g. corelight/zeek-community-id
when dealing with a package source, because the sources impose that structure. This is nice because it's automatic — you don't have to configure it in zkg.meta or some such. But there's no immediate equivalent when installing from local folders or an explicit git URL. For the latter relying on the directory name would normally work (that'd lead to rsmmr/zeek-community-id
for you), and one could use something like local/zeek-community-id
for the former. Maybe... not sure about this. zkg
complains about the package name already existing. (I'm not saying that's a bug, I think it's just not something supported atm.)Looking at the above, I'm inclined to leave this as-is and require explicitly separate configurations, Python-style, for those who need it.
The
scripts
folder in Zeek packages that have plugins has subtle properties that can trip up package developers in unintuitive ways. We've touched this in the past in various contexts but not really discussed solutions, hence this ticket. It could also live in https://github.com/zeek/zeek-aux/ given that's whereinit-plugin
lives atm — I wasn't sure.Context: Zeek packages with plugins include two sets of Zeek scripts:
one set that resides with the plugin, to provide initialization that the plugin code requires (e.g. a script-level type that the native plugin code requires). Zeek's plugin manager explicitly loads these when hooking up a plugin. Zeek's cmake code for managing plugins assumes such scripts reside in the
scripts
folder.The "usual" package-level scripts that any Zeek package provides.
zkg
installs these via its package management, and loads them via thepackages.zeek
file it maintains for capturing package load state. By convention, many packages also usescripts
as the location for such scripts, viascript_dir = scripts
inzkg.meta
.If
$PREFIX
is the Zeek installation folder, the former set ends up in$PREFIX/lib/zeek/plugins/packages/<package>/scripts
(which Zeek also adds to its script search path), and the latter in$PREFIX/share/zeek/site/packages/<package>
.In the package layout established via
init-plugin
, a nested scripts hierarchy addresses this, as follows: there's a nested folder structurescripts/<namespace>/<package>
, where plugin-related scripts should go intoscripts/
, and the package-level ones into the package subdirectory. The whole tree gets installed into the plugin location, and since the template specifiesscript_dir = scripts/<namespace>/<package>
, only the innermost folder gets installed as part ofzkg
's own script management.The
script_dir
line is crucial. If you're not careful about specifying the right folder, two problems can arise:Double-loading of Zeek scripts
If you say
script_dir = scripts
inzkg.meta
, the set of scripts in that folder gets installed twice, once with the plugin, and once as managed byzkg
. When you don't rely onzkg
's package mangement, you don't notice this. In the following, I haven't touched local.zeek, so I'm not usingzkg
'spackages.zeek
. An example of a package that has this problem isamzn/zeek-plugin-bacnet
:But when you now add
zkg
's package-loading, things break:This happens because the files live in separate paths, and Zeek has no way of knowing that the loaded files are actually identical.
Missing logs
If you still use
script_dir = scripts
in zkg.meta despite the nested-folder structure mentioned above, then in many situations things seem fine, since the package-level scripts bootstrap via__load__.zeek
, which for many plugins does nothing (it might not exist at all for the plugin). But odds are the plugin also defines additional functionality (often logging) via the__load__.zeek
in the nested package directory, which now never gets loaded.mitrecnd/bro-http2
is an example that has this problem. Unless you explicitly load the nested folder, no http2.log appears.Weird other problems
Since the entire script tree gets copied recursively into the plugin tree and Zeek adds that folder to the search path, you can end up accidentally sourcing scripts from the plugin space even though you're thinking of package-level scripts only. The
init-plugin
template has this concept of namespacing (you need to provide a namespace and a package name to instantiate), but at the script layer that namespacing doesn't really apply as you'd expect.zkg
installs the scripts under the git-repo-derived name, not<namespace>/<package>
. For example, with Community ID, the scripts ends up inzeek-community-id
, notCorelight/CommunityID
. But, confusingly, you can actually load them via@load Corelight/CommunityID
, because that structure happens to exist in the plugin tree. All this can be hard to track down if you're not deeply familiar with this stuff.I went through our standard package source and currently a bit under half of the Zeek packages with plugins in the standard package source have these problems. (For the Amazon ones I've just submitted PRs to get these fixed, but it's not a sustainable approach...) It affects our own too --
zeek/spicy-tftp
has this problem, for example, so it's clearly something that's easily overlooked.So the question is what we should do about this. Several approaches come to mind:
Adapt the cmake logic that manages plugin content, so it no longer automatically uses
scripts
, or perhaps warns if the scripts folders seem to overlap/collide.Change plugin-level script loading more broadly. I'm not sure plugins really need to source
__load__.zeek
or add the folder to the generic search path? The plugin manager simply first loadsscripts/__preload__.zeek
, then a bif subdirectory, thenscripts/__load__.zeek
. Perhaps we could simplify?Let me stop here. Do you guys have thoughts on how best to improve this? (Sorry for the length here, argh...)