sandstorm-io / vagrant-spk

Packaging tool for Sandstorm, a self-hosting platform for web apps!
Apache License 2.0
55 stars 29 forks source link

Include list of installed Debian packages/versions in .sandstorm #249

Open ocdtrekkie opened 4 years ago

ocdtrekkie commented 4 years ago

22:31 Maybe we could add a feature that dumps a list of debian packages that are installed into the spk, so you can inspect it after the fact to figure out what changed.

For transparency and so that we can diagnose issues while more aggressively allowing updates inside vagrant-spk VMs, we should dump the list of packages and their versions out to a file in .sandstorm that ends up committed to GitHub.

An analogue to this is the "stack" file, which notes what stack was used to build a package, despite being largely vestigial after the stack is set up, but useful for reference.

ocdtrekkie commented 4 years ago

We think there's probably a good command that will produce this output, but don't know off hand what it is.

paulproteus commented 4 years ago

You might find these useful:

dpkg -l -- lists all packages, including versions iirc

dpkg -L {filename} -- asks dpkg what package provided some file that is installed

Happy to say more if needed. Cheers!

ocdtrekkie commented 4 years ago

Thanks Asheesh!

I will do some testing. Assuming the output is what we need, I'll need to decide when to generate this file. If I can get it on the closure of the vagrant-spk dev, so it updates similarly to when sandstorm-files.list is updated, there should be a good strong relationship between the list of files that are included and the list and versions of Debian packages that were used when those files were selected.

zenhack commented 4 years ago

I think dumping it in .sandstorm is a good idea, but what I actually had in mind was putting in the spk, so you can work out how a package was built from the package itself.

ocdtrekkie commented 4 years ago

@zenhack That is probably a trivial adjustment, though it would be a departure from the general behavior of only including what the SPK needs to work in the package itself.

I am also not positive the best way to do this.

As of right now, all public Sandstorm packages are open source, and generally we do verify that the repository has been updated for the latest release during the app review process. While I imagine security folks may wish to unpack an SPK to analyze or study it, and verify that it matches what is in the source repo, I am unsure how likely people are to look at unpacking an SPK for the package info versus looking at the list in the source repo.

Perhaps I am curious how you'd see this file in the SPK being utilized. Are there programmatic uses for this file you'd imagine?

zenhack commented 4 years ago

Yeah, I'm thinking in terms of automatic scanning tools. We could do some of that against the repos too, to some extend. But this is somewhat still in the brainstorming phase. I think we should also put the list in the source repo, so maybe start with that and go from there.

ocdtrekkie commented 4 years ago

@zenhack Do we want to trust that the list is accurate for the purposes of security scanning? If say, like above, we updated it on closing out spk dev, it could be user-modified prior to spk pack. So presumably then the right way to do it for that purpose would be to generate it as part of spk pack instead. (Though I'd still want to do it on dev in case someone didn't commit their GitHub after running pack.)

Would it be possible to determine from the files themselves what version they are? The package list of what was in the VM might contain packages not actually included in the published SPK, leading to false positives. Presumably we just want to know if there's old/insecure binaries that actually make it into sandstorm-files.list (+ the alwaysInclude folders).

And yeah, adding it to .sandstorm so it ends up in GitHub is trivial and costs us nothing, and if we also determine we want it included in the SPK, presumably we are going to still be pulling it from/storing it in .sandstorm.

zenhack commented 4 years ago

We certainly can't trust it's accurate in the face of a developer actively messing with it. We could query the package manager to work out which package each file we actually include belongs to. But we don't necessarily want to exclude a package just because its files didn't land in the spk; that could happen because an executable is statically linked or such (not standard for distro packages, but for some languages (e.g. go) the tooling kindof imposes it), or otherwise have had some influence on something that is included. Hopefully there won't be a huge number of things that get pulled in as a dependency but aren't actually relevant to the final package at all.

ocdtrekkie commented 4 years ago

Perhaps the flow there would be to use this file as a quick filter for vulnerable packages, but then follow it up by directly evaluating what is in sandstorm-files.list on whether or not the vulnerable package is included.

Right now, this solution is vagrant-spk specific, but we may want to brainstorm the matter of a SPK vulnerability scanner ideally working regardless of dev tool. For docker-spk, I imagine a similar solution could be implemented, but I would be extremely wary of including this list for an spk built package, as we cannot assume the developer isn't using their personal machine for dev and that the package list might contain a lot of information leakage about their machine.

ocdtrekkie commented 4 years ago

dpkg -l output on a nearly fresh PineBook Pro is 224 kB. And has descriptions for each and every package. So we should drop a few columns of information from it to try to get it down to size.

zenhack commented 4 years ago

I think it would make things more readable to strip it down to just package name and version, but wrt to the size constraint: by comparison, the sandstorm-http-bridge executable is 51MiB, so that's basically rounding error.

-Ian

Quoting Jacob Weisz (2020-02-02 16:50:47)

dpkg -l output on a nearly fresh PineBook Pro is 224 kB. And has descriptions for each and every package. So we should drop a few columns of information from it to try to get it down to size.

-- You are receiving this because you were mentioned. Reply to this email directly, [1]view it on GitHub, or [2]unsubscribe.

Verweise

  1. https://github.com/sandstorm-io/vagrant-spk/issues/249?email_source=notifications&email_token=AAGXYPUSP3ZPPEVHCIRLOFLRA453PA5CNFSM4KMWPLLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKSB7EQ#issuecomment-581181330
  2. https://github.com/notifications/unsubscribe-auth/AAGXYPSNNOZ5KVJFZGVQ3FDRA453PANCNFSM4KMWPLLA
ocdtrekkie commented 4 years ago

The command I have now is dpkg -l | tail -n +6 | awk '{print $2, $3}' > pkglist and that gives me a nice clean output like:

apt 1.4.9 apt-transport-https 1.4.9 apt-utils 1.4.9 etc.

I still haven't tested it inside the Vagrant box yet. (Such is the downsides of playing with an ARM laptop.) But my 224 kB file went down to 43 kB solely by omitting information I don't want. :)

ocdtrekkie commented 12 months ago

I feel like since this doesn't have a lot of impact on the package itself we're producing, and it just needs to output the command in the right folder, this is probably a relatively simple project for someone familiar with Python who tests out the vagrant-spk packaging flow.