Add dependency hashes to Gemfile.lock

haydentherapper commented 2 years ago

Introduction

Lockfiles provide a snapshot of the exact versions of dependencies used to build a gem. Lockfiles allow developers to build a gem using the same dependencies as the maintainer used.

Some package managers use the lockfile to provide security assurances that the dependencies have not been tampered. For example, in addition to Python's Pipfile to specify dependencies, pipenv will generate a Pipfile.lock which includes SHA256 hashes of all dependencies. NPM supports dependency integrity using package-lock.json. Comparing dependency hashes on installation securely guarantees that a package will be built using the exact same dependencies as the maintainer specified.

Lockfiles with hashes mitigate various attacks:

An attacker in a compromised network delivers malicious dependencies
A repository of packages is compromised and packages are replaced with malicious instances
Local packages on a machine are replaced with malicious instances

With hashes in lockfiles, when bundle install tries to install a dependency for a gem, a malicious dependency will not match the hash of the dependency in the lockfile, and the install will not succeed.

Proposal

I propose that Gemfile.lock include the SHA256 hash of each dependency. SHA256 hashes will be included after each versioned dependency. An example Gemfile and Gemfile.lock (Note the example hashes are random):"

Gemfile:

# frozen_string_literal: true
source "https://rubygems.org"

gem "nokogiri", ">= 1.4.2"

Gemfile.lock:

GEM
  remote: https://rubygems.org/
  specs:
    mini_portile2 (2.1.0) (d1bc8d3ba4afc7e109612cb73acbdddac052c93025aa1f82942edabb7deb82a1)
    nokogiri (1.6.8) (dc460da4ad72c482231e28e688e01f2778a88ce31a08826899d54ef7183998b5)
      mini_portile2 (~> 2.1.0)
      pkg-config (~> 1.1.7)
    pkg-config (1.1.7) (8b97bca79750847558d488e2ea4de79903c9c71c9af27ecf1b3dff5dba2abdd9)

PLATFORMS
  ruby

DEPENDENCIES
  nokogiri (>= 1.4.2)

BUNDLED WITH
   1.12.5

Implementation details

Hashes will be parsed in the lockfile parser. Hashes should be compared to downloaded dependencies before installation, such as when checking specs compatibility. I'm not familiar with the codebase, so please let me know if there's a simpler way to verify hashes.

For backwards compatibility, hashes will need to be optional initially, as previously generated lockfiles will not contain them. When hashes are present, they should be enforced. Hash enforcement should be all-or-nothing - All dependencies must include hashes, since any dependency missing a hash becomes a target for compromise. In addition, a flag can be added to bundle, such as --hashes, to require that hashes are present in the lockfile. In a future version, Bundler could mandate that hashes are present in the lockfile, but this would require maintainers to release gems with updated lockfiles or users would need to use an older version of Bundler.

Next steps

After a discussion on this issue, I'd like to propose this as an RFC.

indirect commented 2 years ago

I believe you would need to put the hashes in a new section of the lockfile (perhaps HASHES) to maintain backwards compatibility. Or is the plan for hashes to only be present in a new version of Bundler with breaking changes?

hsbt commented 2 years ago

How about the slow down installation time like simple Rails application?

simi commented 2 years ago

:thinking: my initial notes

Bundler could mandate that hashes are present in the lockfile, but this would require maintainers to release gems with updated lockfiles or users would need to use an older version of Bundler.

Gems doesn't benefit from lockfile during installation. Only gemspec info is used. Gemspec can declare open version constraint for dependency and that way it is not locked to one individual version of dependency. This whole feature is impossible to be build on gem level.

But this could be interesting for bundler users maintaining own lockfiles in custom apps. Then they can ensure based on those hashes exactly same versions from same sources were installed (once this is implemented). I'm not 100% sure now this would be possible with platform gems, but using latest lockfile format I think all gems for all platforms are being locked in there. That way all deps are in there locked to particular version (and platform) and hashes could be included.

Other question is when to check this integrity. On each bundle install? Only on initial gem download? Do we trust already downloaded gems in cache and also installed (extracted) ones?

We can start with opt-in phase when you need to do something like bundle check --integrity to do this step manually. Later on we can switch the flip.

haydentherapper commented 2 years ago

I believe you would need to put the hashes in a new section of the lockfile (perhaps HASHES) to maintain backwards compatibility. Or is the plan for hashes to only be present in a new version of Bundler with breaking changes?

I'm happy to put the hashes in a new section in the lockfile if this is preferred to a breaking change. I wasn't considering older versions of Bundler being able to successfully validate new lockfiles. Would this type of breaking change be acceptable? I can dig more into the code, but do you know if the lockfile parser currently ignores unknown sections?

How about the slow down installation time like simple Rails application?

It's tricky to estimate the impact, since hash calculation time will be dependent on the processor. For example, on a laptop with a quad-core 2.3 GHz processor, taking the SHA256 hash of the bundler gem (time shasum -a 256 bundler-2.3.8.gem takes 0.04 seconds. I don't expect a significant increase.

Gems doesn't benefit from lockfile during installation. Only gemspec info is used. Gemspec can declare open version constraint for dependency and that way it is not locked to one individual version of dependency. This whole feature is impossible to be build on gem level.

Could you clarify this @simi? I thought that if a lockfile is present, bundle install will use the exact versions specified in the lockfile. If so, when bundle inspects each spec version in the lockfile, it can also compare the hash of the cached or downloaded gem.

Other question is when to check this integrity. On each bundle install? Only on initial gem download? Do we trust already downloaded gems in cache and also installed (extracted) ones?

On each bundle install. We shouldn't assume cached gems haven't been modified. Checking on each bundle install mitigates the risk where an attacker that has access to a machine replaces cached gems with malicious instances.

deivid-rodriguez commented 2 years ago

Linking to the upstream issue and discussion: https://github.com/rubygems/rubygems/issues/3379.

haydentherapper commented 2 years ago

Thanks for the link @deivid-rodriguez. Seems like the consensus is we want this feature to be added.

I see some discussion around when to calculate hashes. My proposal avoids TOFU, as long as the Gem maintainer bundles a lockfile. However in the event where a lockfile is not bundled, the client would have to rely on TOFU.

I also see some discussion around the impact of a compromise of RubyGems. When building from source where the repository bundles a lockfile, a compromise of RubyGems would be detected, since a gem's hash would not match the hash from a lockfile.

simi commented 2 years ago

@haydentherapper as I mentioned - Gem maintainers do not bundle lockfile. In theory they can, but it is not used. All info for gem regarding installation process is in gemspec. Bundling lockfile to gem doesn't make any sense. Lockfile locks dependencies to given versions during dependency resolution. Dependencies in gemspec are usually open (like ~> 1.2.0) allowing bundle during resolution to decide which version (starting from 1.2.0 to 1.2.x) to pick.

I can explain more on call if needed with examples. Feel free to ping me at Bundler Slack.

haydentherapper commented 2 years ago

Ah, sorry, I see where my confusion is. I was conflating libraries/gems with applications. I was referring to maintainers checking the lockfile into a source repository for applications. Users who build an application from source can use that lockfile to guarantee integrity of the dependencies. Re-reading what you wrote in https://github.com/rubygems/rfcs/issues/39#issuecomment-1056105079, you had mentioned this benefit for applications.

simi commented 2 years ago

@haydentherapper yes, that's exactly where I see the benefit. I'm all in for this feature btw. We just need to set achievable goals. Given there is application using Gemfile and Gemfile.lock, during deployment (for example docker build on CI), it would be beneficial to check same dependencies (checked by integrity of hashes in Gemfile.lock) are used to create the bundle for the app runtime.

jsierles commented 2 years ago

FWIW, just having the hashes in the lockfile with no other functionality would be very useful for 3rd party tools like Nix. In that case, it would be useful to be able to fetch and write hashes using only bundle lock.

DavHau commented 2 years ago

We can start with opt-in phase when you need to do something like bundle check --integrity to do this step manually. Later on we can switch the flip.

I'd suggest to be careful with introducing such a feature as optional. There is simply no reason to do that, if the hashes are stored in a way so that previous bundler versions can just ignore them.

If the current bundler version supports checking hashes, then validation should be mandatory/automatic and produce a critical error if it fails.

If storing and validating of hashes is implemented correctly, there is no reason for it to fail. If it fails, then there is a problem with the implementation. Better fail early and fix it early.

An example where this went wrong is NPM. Hash validation is still not enforced there until this day. There are corner cases where hashes are not stored for some reason and not verified. As this doesn't lead to an error, nobody notices and nobody fixes it. The security/reproducibility guarantee isn't there, despite the users thinking it is. (see https://github.com/npm/cli/issues/4460)

I think it's best to make a hard cut. Once this feature is completed, bundler should always add checksums during installation, even on existing lockfiles which have been created by previous versions. It won't break anything for older clients, as they will simply ignore it.

NPM for example seems to have the policy to never modify existing lock file entries except a change for exactly that package is requested (like an update for example). This doesn't make any sense for me. Some nodejs projects carry lock files with them with missing hashes as those entries were created by previous or buggy versions of NPM and users have no chance of noticing that. To make sure lock files are reliable, users would have to delete their lock files every once in a while and re-generate them from ground up. Of course nobody does that.

TL;RD; After any package related operation like installing / or updating, all checksum entries should be complete. If the desired state differs from the stored state, either update the lock file immediately (if the change is expected, non-critical), or raise a critical error.

kuahyeow commented 2 years ago

For reference, bundler does something similar since https://github.com/rubygems/bundler/pull/4851

rubygems / rfcs