zeek / package-manager

A package manager for Zeek
https://docs.zeek.org/projects/package-manager
Other
43 stars 27 forks source link

The scripts folder in plugin-packages is prone to causing trouble #94

Open ckreibich opened 3 years ago

ckreibich commented 3 years ago

The scripts folder in Zeek packages that have plugins has subtle properties that can trip up package developers in unintuitive ways. We've touched this in the past in various contexts but not really discussed solutions, hence this ticket. It could also live in https://github.com/zeek/zeek-aux/ given that's where init-plugin lives atm — I wasn't sure.

Context: Zeek packages with plugins include two sets of Zeek scripts:

If $PREFIX is the Zeek installation folder, the former set ends up in $PREFIX/lib/zeek/plugins/packages/<package>/scripts (which Zeek also adds to its script search path), and the latter in $PREFIX/share/zeek/site/packages/<package>.

In the package layout established via init-plugin, a nested scripts hierarchy addresses this, as follows: there's a nested folder structure scripts/<namespace>/<package>, where plugin-related scripts should go into scripts/, and the package-level ones into the package subdirectory. The whole tree gets installed into the plugin location, and since the template specifies script_dir = scripts/<namespace>/<package>, only the innermost folder gets installed as part of zkg's own script management.

The script_dir line is crucial. If you're not careful about specifying the right folder, two problems can arise:

Double-loading of Zeek scripts

If you say script_dir = scripts in zkg.meta, the set of scripts in that folder gets installed twice, once with the plugin, and once as managed by zkg. When you don't rely on zkg's package mangement, you don't notice this. In the following, I haven't touched local.zeek, so I'm not using zkg's packages.zeek. An example of a package that has this problem is amzn/zeek-plugin-bacnet:

$ zkg list
/home/christian/devel/zeek/packages/amzn/zeek-plugin-bacnet (installed: master) - Plugin that enables parsing of the BACnet standard building controls protocol
$ zeek -NN
...
Zeek::BACnet - BACnet protocol analyzer (dynamic, no version information)
    [Analyzer] BACnet (ANALYZER_BACNET, enabled)
    [Event] bacnet
$ zeek -r some.pcap
$

But when you now add zkg's package-loading, things break:

$ zeek -r some.pcap packages
warning in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./consts.zeek, line 8 and /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 9: redefinition requires "redef" (BACnet::bvlc_functions)
warning in /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 23: Duplicate identifier documentation: BACnet::bvlc_functions
warning in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./consts.zeek, line 26 and /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 27: redefinition requires "redef" (BACnet::ipv6_bvlc_functions)
warning in /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 41: Duplicate identifier documentation: BACnet::ipv6_bvlc_functions
warning in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./consts.zeek, line 44 and /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 45: redefinition requires "redef" (BACnet::apdu_types)
warning in /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 53: Duplicate identifier documentation: BACnet::apdu_types
warning in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./consts.zeek, line 56 and /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./consts.zeek, line 57: redefinition requires "redef" (BACnet::confirmed_services)
...
error in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./main.zeek, line 36 and /home/christian/inst/opt/zeek/share/zeek/site/packages/./zeek-plugin-bacnet/./main.zeek, line 37: already defined (BACnet::ports)
error in /home/christian/inst/opt/zeek/lib/zeek/plugins/packages/zeek-plugin-bacnet/scripts/./main.zeek, line 46: already defined (BACnet::bytes_to_count)
fatal error in error: Type::AsFuncType (error/func) (error)
Aborted (core dumped)

This happens because the files live in separate paths, and Zeek has no way of knowing that the loaded files are actually identical.

Missing logs

If you still use script_dir = scripts in zkg.meta despite the nested-folder structure mentioned above, then in many situations things seem fine, since the package-level scripts bootstrap via __load__.zeek, which for many plugins does nothing (it might not exist at all for the plugin). But odds are the plugin also defines additional functionality (often logging) via the __load__.zeek in the nested package directory, which now never gets loaded. mitrecnd/bro-http2 is an example that has this problem. Unless you explicitly load the nested folder, no http2.log appears.

Weird other problems

Since the entire script tree gets copied recursively into the plugin tree and Zeek adds that folder to the search path, you can end up accidentally sourcing scripts from the plugin space even though you're thinking of package-level scripts only. The init-plugin template has this concept of namespacing (you need to provide a namespace and a package name to instantiate), but at the script layer that namespacing doesn't really apply as you'd expect. zkg installs the scripts under the git-repo-derived name, not <namespace>/<package>. For example, with Community ID, the scripts ends up in zeek-community-id, not Corelight/CommunityID. But, confusingly, you can actually load them via @load Corelight/CommunityID, because that structure happens to exist in the plugin tree. All this can be hard to track down if you're not deeply familiar with this stuff.

I went through our standard package source and currently a bit under half of the Zeek packages with plugins in the standard package source have these problems. (For the Amazon ones I've just submitted PRs to get these fixed, but it's not a sustainable approach...) It affects our own too -- zeek/spicy-tftp has this problem, for example, so it's clearly something that's easily overlooked.

So the question is what we should do about this. Several approaches come to mind:

$ tree
.
├── plugin
│   ├── scripts
│   │   ├── __load__.zeek
│   │   ├── __preload__.zeek
│   │   └── types.zeek
│   └── src
│       ├── config.h.in
│       ├── foobar.bif
│       ├── Plugin.cc
│       └── Plugin.h
├── scripts
│   ├── __load__.zeek
│   └── main.zeek

Let me stop here. Do you guys have thoughts on how best to improve this? (Sorry for the length here, argh...)

jsiwek commented 3 years ago

I went through our standard package source and currently a bit under half of the Zeek packages with plugins in the standard package source have these problems. (For the Amazon ones I've just submitted PRs to get these fixed, but it's not a sustainable approach...) It affects our own too -- zeek/spicy-tftp has this problem, for example, so it's clearly something that's easily overlooked.

That's interesting to find that many have potential issues, thanks for looking. Any guesses/commonalities about how they came to be that way?

I recall the initial init-plugin for some time did not output any zkg.meta (bro-pkg.meta), that may have been around 2.6, and the first plugin-based packages I created even suffered this pitfall and I learned the hard way. An interesting thing to know could be how many of the existing packages with this issue were created before init-plugin offered a guiding zkg.meta ? If it's a high number, it at least helps reduce worry that it's a rampant mistake people make and more of a historical relic that might phase-out/minimize over time even if we did nothing further.

  • Alter the script layout in the package template so the collision is less likely. For example, you can move all of the plugin functionality into its own directory, so you end up with this kind of tree that came up over in #93:

This seems like a "simplest thing that works" for dealing with original confusion that likely led to picking wrong script_dir paths: just put the right scripts/ dir that people need to use at the top-level, making it the obvious choise. Maybe you'll still end up installing (and taring) things into a similar layout as before though? i.e. since people can use these plugins standalone and technically a load like @load Corelight/CommunityID works that way, then that should still work after any changes we do?

Also means there could still be the double-loading problem when installing via zkg and trying to use both @load Corelight/CommunityID and @load packages/@load zeek-community-id since there's two separate copies of those scripts. For that, I think zkg may be able to detect when script_dir is nested within plugin_dir and, if it is, then instead of creating a copy when installing script_dir, it makes a symlink that points into the location where plugin_dir gets installed.

Doing both those seems like relatively straightforward approach that should address the problem.

ckreibich commented 3 years ago

Any guesses/commonalities about how they came to be that way?

It's hard to generalize, but from digging through git commit histories it usually looks like the packages didn't start out from the init-plugin template. Some started out as "pure" plugins and had the zkg packaging added later.

This seems like a "simplest thing that works" for dealing with original confusion

Cool.

Maybe you'll still end up installing (and taring) things into a similar layout as before though?

I couldn't quite follow you here — do you mean that despite the adjusted directory structure, people might still (accidentally) put a tree in place that could cause loads like @load Corelight/CommunityID to work even though they shouldn't?

I think zkg may be able to detect when script_dir is nested [...]

That's nice. It'd restore Zeek's existing ability to detect that a script has already been sourced, which is great.

Two related thoughts here, while we're on it:

I do wonder whether the plugin-level script need to be as open-ended and flexible as we currently make them. For example, do there really need to be scripts in there that the plugin manager's entry points don't cover? Do any of those scripts/folders need to become part of the script search path? I've not seen examples yet of packages that actually require this. Not critical — I'm just wondering whether we can simplify this a bit.

Do you think the existing namespaced load-paths (like Corelight/CommunityID) are worth supporting? I couldn't tell from the code whether they were more or less a side-effect of the nested folder structure, or a design goal. Personally, I really like that people can load packages via their package name (e.g. @load zeek-community-id), i.e. via zkg's folder management as opposed to the plugin's.

jsiwek commented 3 years ago

Maybe you'll still end up installing (and taring) things into a similar layout as before though?

I couldn't quite follow you here — do you mean that despite the adjusted directory structure, people might still (accidentally) put a tree in place that could cause loads like @load Corelight/CommunityID to work even though they shouldn't?

It was mostly unfinished thinking about expectations around @load Corelight/CommunityID like you mentioned:

Do you think the existing namespaced load-paths (like Corelight/CommunityID) are worth supporting? I couldn't tell from the code whether they were more or less a side-effect of the nested folder structure, or a design goal. Personally, I really like that people can load packages via their package name (e.g. @load zeek-community-id), i.e. via zkg's folder management as opposed to the plugin's.

Here's some questions/reasoning I go through:

Don't think so since whatever change is made doesn't impact how existing plugins work and easy enough to explain how new plugins created from latest init-plugin start to work.

I don't remember anything specific, maybe Robin has something. That scheme could help reduce chance of naming conflict/collision, but ultimately seems arbitrary and not something that has to be standardized -- hypothetically, if a user had ability to better customize installation-path of the scripts, it's maybe more useful if they were free to choose for themselves how to refer to a plugin's scripts in @loads rather than restrict it.

I didn't understand this yet: the standalone plugin installation will have to install the pure-scripts somewhere and zkg will still rely on that for installation of plugin-specific parts, but also additionally still do its own separate install of the pure-script parts even though the standalone install logic does that, too?

So when using zkg, if there's two distinct copies of the pure-scripts that get installed, is the "double-loading" problem still a concern or what's done to minimize it?

I do wonder whether the plugin-level script need to be as open-ended and flexible as we currently make them. For example, do there really need to be scripts in there that the plugin manager's entry points don't cover? Do any of those scripts/folders need to become part of the script search path? I've not seen examples yet of packages that actually require this. Not critical — I'm just wondering whether we can simplify this a bit.

I'm not sure, but if changes there are not directly related/needed to address the particular problems in the original discussion, it may be easier to leave that for separate treatment.

rsmmr commented 3 years ago

No strong opinions here (other than that it'll be good to address this!), just a couple of notes:

ckreibich commented 3 years ago

Great feedback folks, thank you. Sorry for taking a bit to follow up here.

To where does a default, standalone plugin installation (without zkg) end up installing the scripts ?

Yeah, my thinking here was as you say Jon — nothing needs to change for pure plugins because they don't have a problem. I think this is purely about finding something acceptable that avoids the dual-loads for zkg packages. This then also pretty naturally leads to not messing with plugin semantics, since they don't need to change. Agreed.

So it sounds to me like the path forward can be as you proposed, @jsiwek: combine plugin-tree relocation in the package source with nesting detection to enable symlinks.

I think this leaves your point about installing multiple versions, @rsmmr. This gets a bit hairy for me. A few thoughts:

Looking at the above, I'm inclined to leave this as-is and require explicitly separate configurations, Python-style, for those who need it.