orca-app / orca

A Wasm environment for cross-platform, sandboxed graphical applications.
https://orca-app.dev
Other
349 stars 16 forks source link

Proposal: Automatic, tamper-resistant version management #43

Open bvisness opened 11 months ago

bvisness commented 11 months ago

It will be very useful for the Orca CLI tooling to act as a version manager for the Orca runtime. Many developers use tools like n for node, or rvm for Ruby, to manage multiple system installations of the same tool, allowing them to easily switch between installations without trashing their system.

Orca users would benefit from the same capabilities. For example, you may have two Orca projects targeting different versions of the Orca runtime. Making the Orca tooling version-aware would allow orca bundle --version=1.0.0 and orca bundle --version=1.2.0 to work simultaneously. The tooling could even automatically download the correct version of the Orca runtime if not already present on the user's computer, similar to package managers automatically downloading missing dependencies.

However, the implication is that the tooling now has the ability to both learn about and download new versions. This could allow a malicious actor to serve compromised versions of the Orca runtime without the user's knowledge. The client cannot verify whether a runtime version is compromised; the same server returns both the data and the checksum used to verify it. Malicious actions cannot be prevented outright, but they can be made more detectable and more difficult to hide forever.

Examples of malicious activity we would like to prevent, in order of sophistication:

  1. When a user downloads the Orca runtime, an attacker serves a malicious version instead.
  2. When a user downloads the Orca runtime, an attacker serves both a malicious version and checksum, to evade verification.
  3. An attacker briefly inserts a fake, malicious "patch" version of the runtime into the list of versions, tricking some users into downloading it. They then remove it from the list, making their tampering invisible to any future audits, even after stopping their attack. No legitimate checksums are modified.

These kinds of problems can be avoided by publishing a log of all Orca versions with their checksums, and making that log append-only and tamper-resistant. The go command-line tool does this for its package manager. Go packages are downloaded directly from the VCS hosting them (e.g. GitHub), rather than a central repository like npm. However, the contents of the packages are then verified against the Go checksum database, which is an append-only log of checksums (a Merkle tree) for Go packages, hosted by Google. Attackers cannot serve malicious Go packages without also modifying this log - which is easily detectable. See Russ Cox's article for more information.

Given our very small scale, we don't need a full Merkle tree. However, I think we can learn from Go's example and achieve the same properties in our version log to prevent tampering with Orca downloads.

Dead-simple solution

Suppose the Orca website has an endpoint /versions that returns the following:

{
    "versions": [
        { "v": "v0.0.1", "sum": "abcd1234" },
        { "v": "v0.0.2", "sum": "efgh5678" },
        { "v": "v0.1.0", "sum": "ijkl9012" }
    ]
}

The Orca client will download and cache this full list of versions. In the future, whenever it refreshes the list of versions, it will compare the new list against its cached list in order to ensure that no entries have been inserted, deleted, or modified. Only new entries are allowed. Any modifications will be reported loudly to indicate a possible security problem.

When downloading a release of Orca, the download must simply be verified against the already-saved checksum.

This is the dumbest, most straightforward way of making the version history of Orca append-only. Any discrepancies due to attacks will be caught if the attacker ever makes a mistake, and discrepancies can be trivially discovered by comparing the caches of two different clients. If the version history grows very large, the diff may start to become costly, but I think this is a faraway concern - the data itself should remain quite small for the foreseeable future, and no hashing is necessary when performing the diff.

This can actually provide a better user experience than a Merkle tree by reporting the exact discrepancy. This could make it easier to track down the origin of an attack.

Can we retract versions?

If we publish a compromised version of Orca and wish to retract it, we cannot do so by simply removing it from the list of versions. Allowing this would open the door to addition and removal of malicious versions without the tool complaining. Instead, we can simply append retractions to the list of versions, e.g.

{
    "versions": [
        // ...
        { "retract": "v0.0.2", "reason": "overflowed literally every buffer at once" }
    ]
}

The published version and its retraction both remain visible in the log forever, but the tooling can obscure this by hiding retracted versions or printing information about them if requested.

Thoughts? Concerns? This would presumably require us to build some extra stuff into the Orca website, but at least it would be easy to cache.