Issues with fast-forward attack recovery (5.3.11) / Potential Solution

Summary

In this document we illustrate a series of issues regarding the fast-forward attack recovery step. This recovery is outlined in the latest specification as step 5.3.11.

We also propose an alternative solution to the fast-forward problem that does not exhibit these issues.

Details

We have identified three main issues with the current specification.

Issue 1: Vagueness

The description of step 5.3.11 is very vague, in particular it does not describe how key rotation detection should be detected. These may lead to several different interpretations by implementers. It also makes security analysis of the specification difficult/infeasible.

We have analyzed the implementation in go-tuf and python-tuf:

go-tuf: Compares root versions, if a more than threshold keys are removed from a role it will delete locally stored metadata files.
python-tuf: Does not follow the specification. Instead signatures of locally stored metadata is checked upon loading, if the file is no longer signed according to the current root then versions checks are skipped.

Issue 2: Robustness

If programmers decide to implement key rotation detection based on root differences (as go-tuf does it), there is a lack of robustness against crashes.

Details: Assume there is a root update which should trigger step 5.3.11 and that there is a crash somewhere between the root metadata persistence (5.3.7) and before or on step 5.3.11. Due to this crash, step 5.3.11 wasn't executed or not executed completely. The next time the update is performed there is no longer a root update (new root has already been persisted). The update completes successfully but step 5.3.11 is never executed properly.

Issue 3: Enables rollback attacks

Implementations that follow the specification are susceptible to rollback attacks if the condition of 5.3.11 are met. In certain cases this allows an attacker to execute rollback attacks even without any keys having ever been compromised.

This example shows a downgrade attack against the go-tuf implementation. No keys have been compromised by the attacker, attacker requires write access to repo publish location. Certain repository operations must have occurred for this attack to take place. The exact condition required to perform such an attack may be very rare to occur in practice.

$ tuf-client init "http://localhost:8000/00_Init" "public/00_Init/1.root.json"
$ tuf-client get "http://localhost:8000/00_Init" DEMO.txt
> This is version 1 of DEMO.txt
$ tuf-client get "http://localhost:8000/01_Update" DEMO.txt
> This is version 2 of DEMO.txt
$ tuf-client get "http://localhost:8000/02a_AttackerMix" DEMO.txt
> This is version 1 of DEMO.txt

Notice: The implementation python-tuf would reject this particular downgrade attack. Further demonstrating consistency issues between implementation due to vagueness of the specification. For explotation against python-tuf implementation a different setup is needed. Arguably the python-tuf implementation is more robust/difficult to exploit than the go-tuf one.

Conclusion

It would seem that any implementation of a fast-forward attack recovery that is based on revoking trust (e.g. deleting metadata files) is likely in conflict with the rollback attack protection goals. We therefore propose a simple, efficient method which is not based on revoking trust.

Proposed solution

Introduce a new integer attribute called root_version that is present in all metadata files (except the root metadata itself).
When creating metadata files, the root_version field should be set to the latest root version.
Clients reject metadata files and abort the update process if the value of root_version is bigger than the version of the trusted root metadata file.
Verifying/Comparing version number is changed as follows: Instead of only comparing the version field, the integer pair (root_version,version) is compared instead. Comparison is done lexicographically:
- (a1,b1) > (a2,b2) iff (a1 > a2) OR (a1 == a2 AND b1 > b2)
- (a1,b1) = (a2,b2) iff a1 = a2 AND b1 == b2
- (a1,b1) < (a2,b2) iff (a1 < a2) OR (a1 == a2 AND b1 < b2)
The step 5.3.11 is removed from the specification.

Some details were skipped for brevity sake, but I think the idea should be clear.

In case of a key compromise, the attacker can no longer arbitrarily increase the composite version number. All metadata published after the root update which implicitly revoked the compromised key will have composite version numbers are greater than anything any client could have potentially stored.

Further issues

There are further issues regarding the local storage of metadata files. Some implementations have decides to only load local metadata if they have a valid signature. This enables further possibility of rollback attacks.

For version comparisons, the client must not validate local signatures. It might be even better if client only store composite version numbers per metadata file and not entire metadata files.

Storing entire non-root metadata files is not relevant for security, it's only relevant for efficiency (i.e. reducing network requests). Cleanly separating security concerns from efficiency in the specification will likely result in less mistakes by implementers.

Thanks for writing this up. It's a thoughtful solution to a problem we have and isn't one I think we considered.

A few questions:

1) How should this work for clients that update the same metadata on different repositories? Currently they can upload the same targets file everywhere, but I think this could break now with this change.

2) Do clients need to know what the root's version is? I think the answer to this is almost always "no" because this is only relevant to clients who have had a fast forward attack performed.

3) Does this really need to be added to all metadata files? For example, what purpose does this serve for timestamp?

Thank you for the response. I apologize in advance for the lengthy answers.

You are correct, this change would break those workflows involving use of signed files in multiple repositories. This change would introduce essentially introduce a dependency between non-root metadata files and the root metadata file. Is it a goal of TUF to support these workflows?

There are some more flexible variations of the proposed solution. These variations might be more appropriate for usage in multiple root scenarios. However I currently don't have a easy solution that works without any coordination between different repositories.

One variation of the solution is as follows:

Add a non-negative integer field version_max to the root metadata file.
Add a non-negative integer field previous_version_max to root medata file (not strictly necessary, but makes implementation easier).
First root file has previous_version_max set to 0 and version_max set to a imlementation defined constant VERSION_INCREMENT. This value may for example be 1e10. If a new version is published every second this would provide enough version numbers for approx. 317 years.
Clients MUST NOT accept non-root metadata files with versions greater than version_max.
When updating root metadata file, version_max is incremented by VERSION_INCREMENT. previous_version_max is set to the pre-incremented value.
Systems creating non-root metadata SHOULD use versions that is greater than the previous_version_max of the latest root file. In case multiple roots are used the maximum of all previous_version_max fields may be used instead. Coordination between roots is required to make sure that min(version_max) - max(previous_version_max) is sufficiently large.

This variation is easier to implement on the side of the update client. It no longer strongly links to a specific version of the root. It does technically allow multi-root scenarios, but due to the coordination that is required between roots some may consider it as impractical.

With clients do you mean: the one performing the updates, or the one performing the repository manipulation operations? We should probably have some terminology of differentiating between these.

A normal client doing the update workflow (step 5) will always have access to root metadata, so there should not be an issue.

The system doing the repository mutations (creating new metadata) does not need to know anything about the root metadata. Such systems however do not provide automatic fast-forward recovery.

To provide automatic fast-forward recovery, such systems should have somehow access to latest root metadata. For example, they could also automatically do the root-update step of the client workflow to retrieve the latest root versions themselves.

Another approach would be to only perform fast-forward manually recovery if an attack has actually been detected by the repository maintainers or a key compromise has occurred. Fast-forward attacks are generally loud on the client side, so they don't really go unnoticed.

I haven't yet thoroughly analysed each scenario. I kind of wanted to know what other people would think of this kind of solution. I didn't want to spend to much time of writing a detailed analysis in case the overall approach is considered incompatible with TUF goals.

Having a monotonically increasing logical timestamp (which is was the "version" field is) is an easy way to prevent rollbacks. In practice we actually define what a rollback attack is based on these logical timestamps.

Limiting the allowed range of said logical timestamp is an easy way eliminate fast-forward attacks without any compromise to rollback prevention. This is what these proposed solutions do. All accepted metadata by the client has strictly monotonic version numbers - there are no exceptions. New root versions extending the allowed logical timestamp range allows a natural recovery options for clients.

The other solutions which are current implemented are more difficult to reason about. They allow rollback of versions, meaning there is no longer a monotonic chain of versions. I haven't seen any detailed security analysis on the currently implemented approaches.

Back to your original question on timestamp metadata: 1) This provides fast-forward recovery for timestamp metadata files. It serves as a replacement for 5.3.11. 2) The same logic can be used for all non-root metadata files, which is easier to implement.

Possible attacks if attacker is able to sign timestamp and has control over metadata transport:

Silent Freeze attack Attacker can freeze the repository until root expiration occurs. Client will report no error. This attack is not possible to avoid.
Fast forward Attacker can issue large timestamp version (for example 2**64-1 if attacker knows that client uses 64-bit signed integers). Affected clients will now be unable the update (Step 5.4.3.1). This condition persists even after attacker looses control over transport mechanism. Without some kind fast-forward recovery even a root update will not be sufficient for clients to recover. This attack is loud (clients following the spec will falsely report this condition as a rollback attack).

Thanks for the detailed answers. I'm also a bit confused about some aspects of the attacker model here. I think your solution is assuming that the legitimate repository owner has found a way to recover the repository and can communicate normally with legitimate users. Their root files, etc. will revoke access to any keys compromised by the attacker. Please correct me if I'm wrong here.

One thing to note, is that the attacker can generate a bunch of (root_version, version) tuples for root_versions that are not valid yet, and then try to use them in the future. At first this concerned me, but I believe your threat model is that the legitimate parties will revoke the key used to sign the metadata which indicates trust in that key for that case.

If so, then we're really relying on the repository's / delegator's change of that key as the true signal that some metadata should no longer be trusted as I see it. Please correct me if I'm wrong about this...

I agree with most of your conclusions. Attacker model is that attacker can do anything except forge root signature.

For recovery to take place, clients do need attacker-free access to the repository at some point. This is also true for 5.3.11, so this is not a new requirement.

Note that we do use the term "recovery" - however there isn't actually a recovery step the client has to do (unlike 5.3.11). There is no special behaviour required, the only change to the client is that accepted versions are limited by its current root.

A root update is required for recovery (same is true for 5.3.11). However a key rotation/revocation is not strictly necessary for recovery to occur. Maintainer should remove compromised keys and should regularly rotate keys. The main requirement for "recovery" is that the repository publishes versions that are greater than anything the client has in its trusted storage. Therefore maintainers should use the newly allowed versions in the latest root.

In a key compromise, an attacker can sign anything the want. However the client only accepts versions in a certain range.

No, change of keys must not be considered as a signal to revoke trust. This kind of reasoning is what makes it possible to execute rollback attacks against current TUF implementations. Consider this: If I sign something, and later on rotate my key - does that really mean that anything I have previously signed should be considered invalid? Should all learned information be deleted? Here is an illustration on why this is dangerous:

Assume:
- j, m can both make signed statements (threshold = 1).
- Attacker can only replay messages.
- Initial trusted version by client is 1.

Goal: Version never decreases

Repo: "The latest version is 12" signed m.
Client: OK, latest version is 12.

Repo: "The latest version is 13" signed j.
Client: OK, latest version is now 13.

Repo (Replay): "The latest version is 12" signed m.
Client: NOT OK, rollback is not allowed.

Repo: "Hey I changed my signature, my new Signature is now ..." signed j
Client: OK, but the last statement I received now no longer has a valid signature. So I'm going to forget everything now.

Repo (Replay): "The latest version is 12" signed m.
Client: OK, latest version is 12.

Okay, so I've thought some more about this and it seems to me this boils down to a choice:

1) We can fix the current implementations and make the spec clearer so that this is less likely to recur. This places more complexity on the TUF implementer, but less on the repo admin and none on the end user. 2) We can add some complexity to the specification with this design, which seems like it will make the implementation easier and the specification clearer. This places more of a burden on the repo admin and a slight burden on the end user.

In general, we prefer complexity for the TUF implementers over that for repo admins and certainly don't want any for end users. My sense is that we should prefer the first option over the design proposed in this issue.

Note, that I do like the design here and welcome your further thoughts or contributions on this or other issues. I also welcome other community members with divergent thoughts to chime in and re-open this if you think it is appropriate.

theupdateframework / specification