theupdateframework / taps

TUF Augmentation Proposals (TAPs)
Other
31 stars 19 forks source link

Discussion of TAP 7: Conformance testing #36

Closed vladimir-v-diaz closed 7 years ago

vladimir-v-diaz commented 7 years ago

TAP 7 is available at https://github.com/theupdateframework/taps/blob/master/tap7.md

Pull requests related to TAP 7:

19, #30, #40, #41, #42, #43, #44, #55

Additional discussion of TAP 7 on the TUF mailing list: https://groups.google.com/d/msg/theupdateframework/6XHRPTAQdbI/oFW3Ox8YBQAJ

heartsucker commented 7 years ago

Ciao, y'all. Responding to your email here to aid in public discussion.

I think the idea of a conformance tester is appropriate, however I think there is a problem of near limitless configuration issues. Some of this was touched upon with the idea of .tuf-tester.yml to configure the conformance tester. Some of the options there make complete sense, but with the possibility of moving away from strictly using JSON as the data interchange, I think it would be infeasible to have all possible options for example.

Imagine these options that the conformance tester would have to support.

I could probably keep going, but that would just be unnecessary. There will be a lot of options, and maintaining them all is going to be a pain, or if something is considered out of scope (e.g., Cap'n Proto), then an implementation that uses that feature will never be able to claim conformance.

What I propose would to make the spec more RFC like (https://github.com/theupdateframework/tuf/pull/455) where each single requirement has a number, and there needs to be an integration test that demonstrates that requirement is robust against all known attacks and failure modes.

It should be reasonably easy for someone from the TUF team to skim code in any language to verify that the integration test meets the criteria. If all line items in the spec are met with a sufficient number of tests, then the implementation can be labeled as "TUF Compliant."

This would actually be better in my opinion because then implementers could neatly drop the TUF spec into an ISO-26262 (or similar) workflow whereas the conformance tester would likely not be sufficient for regulators.

For example, my Rust implementation likely won't work with the conformance tester, but I rarely let a spec-feature make it into develop without very pedantic integration tests. I'd say any feature that is in develop is rock solid, but using the conformance tester, I couldn't prove it.

So all in all, I'm against TAP 7 being mandatory, and I think the TUF team's efforts could be better spent on other things.

JustinCappos commented 7 years ago

So, in my view the conformance tester need not include most of the options you mention. We provide a tester that leverages data in one format. You are free to use that format or to convert it to whatever format(s) your tool can handle. If your code supports BLAKE2 and our reference implementation supports SHA256, that's fine. You will re-hash / sign / convert files before exposing them to your client. You are required to return the appropriate error / success code to our tester, but how you do things like this are also up to you.

Does this clarification help? We will try to clarify the TAP so it is more clear about this.

Justin

On Tue, Jun 13, 2017 at 4:58 AM, heartsucker notifications@github.com wrote:

Ciao, y'all. Responding to your email here to aid in public discussion.

I think the idea of a conformance tester is appropriate, however I think there is a problem of near limitless configuration issues. Some of this was touched upon with the idea of .tuf-tester.yml to configure the conformance tester. Some of the options there make complete sense, but with the possibility of moving away from strictly using JSON as the data interchange, I think it would be infeasible to have all possible options for example.

Imagine these options that the conformance tester would have to support.

  • Data interchange:
    • JSON
    • ASN.1
    • Protobuf
  • Signing Key Types (not to mention how they might be named rsa-ssa-pss or rsa-ssa-pss-sha256 or RSASSA_PSS_SHA256)
    • RSA
      • PKCS#1
        • over SHA256
        • over SHA512
        • etc.
      • SSA-PSS
        • over SHA256
        • over SHA512
        • etc.
      • Ed25519
    • ECDSA
  • Field names
    • type vs. _type (I really think the spec shouldn't use an underscore here)
    • sig vs. signature (an implementation may choose to be verbose for clarity)
  • Field Values
    • Root vs. root (again, I think capitalization here is unnecessary)
  • Hash maps vs. lists
    • The signatures value is a list with unique values, so it might as well be a hash map of keyid to sig
  • Usage of mirrors role (TAP4 #33 https://github.com/theupdateframework/taps/issues/33)
  • Possible different definitions of "canonical" for the given data interchange format
  • Multi-role delegations (TAP3 #32 https://github.com/theupdateframework/taps/issues/32)
  • Signature / hash encoding
    • base64
    • base32
    • hex
  • Key encoding
    • RSA
      • PEM/PKCS1
      • PEM/PKCS8
      • PEM/SPKI
    • Ed25519
      • PEM/PKCS8
      • hex
    • Communication protocol
    • HTTP
      • As a note, our Uptane implementation at work will only communicate over mutual TLS which would mean we couldn't ever use the conformace tester without gutting the code just to support it
    • MQTT (might be needed for IoT / embedded / Uptane)
    • Custom TCP

I could probably keep going, but that would just be unnecessary. There will be a lot of options, and maintaining them all is going to be a pain, or if something is considered out of scope (e.g., Cap'n Proto), then an implementation that uses that feature will never be able to claim conformance.

What I propose would to make the spec more RFC like ( theupdateframework/tuf#455 https://github.com/theupdateframework/tuf/pull/455) where each single requirement has a number, and there needs to be an integration test that demonstrates that requirement is robust against all known attacks and failure modes.

It should be reasonably easy for someone from the TUF team to skim code in any language to verify that the integration test meets the criteria. If all line items in the spec are met with a sufficient number of tests, then the implementation can be labeled as "TUF Compliant."

This would actually be better in my opinion because then implementers could neatly drop the TUF spec into an ISO-26262 (or similar) workflow whereas the conformance tester would likely not be sufficient for regulators.

For example, my Rust implementation https://github.com/heartsucker/rust-tuf likely won't work with the conformance tester, but I rarely let a spec-feature make it into develop without very pedantic integration tests. I'd say any feature that is in develop is rock solid, but using the conformance tester, I couldn't prove it.

So all in all, I'm against TAP 7 being mandatory, and I think the TUF team's efforts could be better spent on other things.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/theupdateframework/taps/issues/36#issuecomment-308052540, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0XDydHoUtwBg-cBEs7BA_dp4fOfkRnks5sDk8tgaJpZM4NsGha .

heartsucker commented 7 years ago

I think also this would run the issue of the conformance tester and the spec diverging, or the conformance tester becoming the de facto spec since it may only be able to test a subset of all things that the spec allows. This itself might inhibit adoption of TUF because an implementation may choose to just go its own way knowing it may never be able to be "TUF Compliant."

heartsucker commented 7 years ago

You are free to use that format or to convert it to whatever format(s) your tool can handle.

Wouldn't this inherently break all signatures?

Edit: Oops, spoke too soon. I think that would be overly burdensome since TUF really is a client side spec plus something like Nginx serving static files. That would also force all TUF things to implement the signing / data generation side of things to pass the spec.

JustinCappos commented 7 years ago

Yes, but we give you key info as part of the repo, in case you want to re-sign things yourself. If the key types are different, it's up to you to ensure you are generating and using keys to achieve the same outcome as the original tests.

On Tue, Jun 13, 2017 at 5:09 AM, heartsucker notifications@github.com wrote:

You are free to use that format or to convert it to whatever format(s) your tool can handle.

Wouldn't this inherently break all signatures?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/theupdateframework/taps/issues/36#issuecomment-308055282, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0XD8TJ6u7qs8t1yaVJkx_gD-UDJwLDks5sDlHjgaJpZM4NsGha .

heartsucker commented 7 years ago

I think also this would put a fair bit of overhead on the side implementers in that they will have to support mangling all the data instead of just doing what works best for them. The spec is already moving fast and having them try to keep up with the conformance tester too could be a pain.

JustinCappos commented 7 years ago

So our hope / thinking is that the 'mangling' is mostly what you already do to write metadata. Other mangling (messing with bytes, etc.) should be minimal.

However, I think we all appreciate that until we have other people try something like this out, we really do not know how well it will work. We will certainly be open to changes, especially for the first implementer or two.

On Tue, Jun 13, 2017 at 5:12 AM, heartsucker notifications@github.com wrote:

I think also this would put a fair bit of overhead on the side implementers in that they will have to support mangling all the data instead of just doing what works best for them. The spec is already moving fast and having them try to keep up with the conformance tester too could be a pain.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/theupdateframework/taps/issues/36#issuecomment-308055981, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0XDx-WXz4LSr5qV3JrQ3qf1SJKO2kGks5sDlKTgaJpZM4NsGha .

ecordell commented 7 years ago

I started drafting a reply yesterday but other things got in the way, sorry.

I share a lot of @heartsucker's concerns. In particular:

  1. TAP7 is written such that it will immediately become out-of-date (because it's full of implementation detail). If there needs to be a standard shared tool for conformance testing, I don't think its implementation needs to be described in TAP7. TAP7 can just be "must pass conformance tool, see this other git repo"

  2. A standard tool doesn't seem like it could adequately cover the various configurations (because TUF leaves many decisions up to the implementation).

But what I would prefer would be to have a set of test vectors to use in implementation-specific tests so that there's shared set of things we agree on (metadata signed by an untrusted key -> fail). There are cases where clients may interact with the same set of data and make different decisions because it's either out of the scope of the spec or it's intentionally left up to implementation.

For example, Notary chooses to continue the update process with cached local metadata if remote metadata fetching/validating fails for some reason (and the local cache is still valid). Supporting the conformance tool as described would be difficult in this case because we don't exit with an error code. We could write a separate interface which exits as described, but then are we really conformance testing the client?

heartsucker commented 7 years ago

So our hope / thinking is that the 'mangling' is mostly what you already do to write metadata. Other mangling (messing with bytes, etc.) should be minimal.

Yeah, I looked in the code, and this is still a lot of mangling to be done. If you look at the test vectors I've been working on, each vector is super pedantic and tests exactly one thing (the labels are not great, but look at the "DSL" in generator.py). Generating these vectors on my own in a DRY way without the need to integrate into any other tool was already a bit of a pain. I imagine things like "flip a bit in the hash of targets.json" or "duplicate signatures" would become a tedious exercise in config parsing, which itself might be a source of bugs were some conformance test isn't translated correctly and a client falsely passes.

What would be more useful would be a list of tests that map back to a spec number with explicit, human readable descriptions of what they do.

For example, let's say the following part of the spec exists.

3.2.1 - The client must reject metadata with an unmet threshold number of signatures.

The test document might have:

Tests for 3.2.1

Test 1

Root metadata defines role R with threshold 1 and key A Repo serves role R with 0 signatures Client does not trust metadata R

Test 2

Root metadata defines role R with threshold 1 and key A, and role R2 with key B Repo serves role R with 1 signature from key B Client does not trust metadata R

Etc.


Checking that all tests are implemented would guarantee correct behavior, and this should actually be easier to do than the complex wrappers and conformance tester tool. If these tests exist and are tagged well, they'd fit into other test frameworks and reporting/coverage tools more easily.

Also, as @ecordell mentions, there is undefined behavior in the spec. The concept of aborting is a bit odd in the sense that if a client has valid targets metadata and valid delegations, but a new targets metadata is bad, should the client refuse to verify a target? Or should it use the old metadata to verify a target? (This might actually have been clarified at somepoint)

awwad commented 7 years ago

If I may summarize, there are a few issues that you two (@heartsucker and @ecordell) raised here:

  1. Skip Tester and provide vectors instead (high-level proposal)
  2. Tester dev difficulty: due to the need to support many different configurations
  3. Updater/Wrapper dev difficulty: due to the need to hack many elements of the Updater's operation
  4. TAP 7 Upkeep
  5. Undefined portions of the spec

I'll address each.

Test Vectors Instead of Tester

We thought about this a bit. In the process of implementing the conformance tester, we should end up with a solid set of test vectors (and I think the ones you linked are a good set to work from). Should it be too onerous for an implementer (though I'm still working toward understanding why it would be so onerous), I think we would probably see the test vectors being a decent alternative if the implementer intends to develop testing that handles them all anyway.

I see this as a way to keep implementations in sync on the base guarantees of TUF, so that implementations can profit from each other and some level of testing comes for free (or at least cheap), possibly even in a way that allows continuous integration testing, so it's worth trying to sustain the idea as long as it does prove to be as workable as we suspect it is.

Tester dev difficulty

As Justin said, I don't think that most of the options you're concerned that the conformance tester need support are required. Here's how I see each item in your (@heartsucker) list:

Items from the list for which support with/without is provided in the Tester:

Items from the list handled - to the extent the implementer requires, however the implementer requires - in the implementer's Wrapper:

Updater/Wrapper dev difficulty

First, in particular, I expect "instructions" (the dictionary to parse to decide how to re-sign metadata when that's necessary) to have very few possible values. It's even possible that we can do without it entirely, but I won't know until we're done with the first iteration of the Tester, or at least the first iteration of the test vectors. (For example, we can probably provide the wrong key or set of keys for a signature rather than trying to corrupt the signature, in which case there would be no "instructions" for that case.)

@heartsucker: Maybe what's necessary for me to grok whether or not it's hard to support TAP 7 is to try to tinker with your implementation to see if I can cook up a Wrapper module for it? It's something I worried about a lot, but in the end, it didn't seem too bad in the abstract. On the "communication protocol" item in the Tester dev difficulty list: I'm not sure why it requires gutting client code. Could you just convert metadata in code in or called by Wrapper.set_up_repositories(), and send it to the server to be hosted normally for a client? Would that require much work?

@ecordell: I assume by "local cache", you mean previously-validated metadata. We've started to address this in a new issue about behavior to expect when metadata validations fail. In brief, if I understand you correctly, you're saying that if new metadata fails to validate (to some level - timestamp, timestamp+snapshot, ...), and (presumably) old metadata is still unexpired, you want to validate targets that match existing previously validated unexpired metadata. The reference implementation itself does not allow this, I think, and you can sort of infer from the spec that this is not the desired behavior - but I'm not sure that that requirement is clear or desirable. The spec should absolutely be clear about what the desired behavior should be, and on whether or not we should allow an option for it to be one way or another in certain situations (details in that future issue). In any event, I originally specifically did not intend to test this class of scenarios, as I pointed out that new metadata for the test cases should always specify some new, independent target, and so we wouldn't have run into the problem of having this ambiguous expectation. Pretending for a moment that this was a problem we were going to run into, that your behavior is what you're sticking with, and that we won't be making it a configurable option, I'd say that you could, in test mode, when failing over to the local cache occurs, spit out a note to some test log, and when responding to the Tester in Wrapper.attempt_client_update(), check the end of that log (or something like that), which is probably easy. Anyway, we need to decide on how this should work, regardless of the existence of TAP 7, and such hacks for TAP 7 in particular are undesirable. The behavior you specify seems sane, and if this behavior is important to you, I suspect we'll at least have an option. (:

TAP 7 Upkeep

I probably have to think about this a bit more, but the intention of TAP 7 is to be adequately general that it won't ever require much modification. You may, however, be referring to changes made to tests / test vectors over time. I look at these as changes to the spec, so it would be important for us to get them right, and then they'd adapt in tandem with the spec if new vulnerabilities are discovered etc.

Undefined Portions of the Spec

In general, to the extent that they matter, undefined portions of the spec are bugs in the spec to be resolved, and running into these would be good. See the note at the end of Updater/Wrapper dev difficulty, as you both seem to be asking the same question.

heartsucker commented 7 years ago

My main issue isn't that integrating with the conformance tester can't be done, but more that it forces an implementer to write Python, which while a simple language, isn't the best idea. Python itself has gotchas and edge cases. Even though these are simple wrapper functions, a dev has to know about all the edge cases since this is testing security critical code. This means every project requires someone to know Python, and know how to do crypto right in Python. Or do a lot o FFI binding or shelling out. In any case, the means every project has a hard Python dependency. I think this is overly burdensome.

It also turns the testing into a black box which I think isn't great from a security standpoint. I think it's better to have the spec say "what" to do and use a list of test vectors as a bit of a "why." Explicitly stating what the vector needs to be gives the dev more knowledge on how to code it to the spec and what sort of edge cases need their own tests.

As an analogous case, if the TLS spec required every client/server pair to have a Python wrapper that did things like tell it how to generate bad handshakes or inject bad packets, nothing would officially meet the spec because I'm pretty sure most devs would say "we wrote the equivalent tests in our native language, and that's sufficient."

Could you just convert metadata in code in or called by Wrapper.set_up_repositories(), and send it to the server to be hosted normally for a client? Would that require much work?

That would infeasible for our Uptane / TUF internal implementation. Based off architecture, we'd have to code the ability for a couple of services to do Bad Things to allow this to happen. Since I don't think the conformance tester would give any benefits on top of the testing we already do, I don't think the PM would allocate time to implement it. I also would recommend to the team not to since I think there are other tests and security features that are more pressing. I think the only two test cases that we don't yet cover is slow data and metadata rollbacks. The former (while annoying) doesn't impact anything because the updates are async, and if there's a MITM, there's so much other badness they can do anyway (DNS cache poisoning, NTP spoofing).

Going back to what I said above, I'm the only person who has ever done any Python professionally on the engineering team, so we'd (probably?) end up having to do FFI bindings to Haskell integration tests which is just reeeaaaaally not something that sounds like it would be fun. Or ever get done.

Additionally, the tap7_wrapper_skeleton.py has ~300 lines of text explaining how to use it correctly. This is on top of the TAP itself. Given that we already have mistakes in how to implement the spec, I think it's likely we'd see as many mistakes in implementing the wrapper. I would much rather see the spec made stable and very clear and even a cursory pass at a list of suggested vectors before there is a mandatory conformance tester.

JustinCappos commented 7 years ago

Okay, so you'd like us to just provide the tests.

Do you feel it is worth having a well defined format for us to specify the tests instead of a more textual description? The reason would be so that an automated tool of yours could run the tests from these programmatically, without needing you to implement something for each test.

On Wed, Jun 14, 2017 at 5:20 AM, heartsucker notifications@github.com wrote:

My main issue isn't that integrating with the conformance tester can't be done, but more that it forces an implementer to write Python, which while a simple language, isn't the best idea. Python itself has gotchas and edge cases. Even though these are simple wrapper functions, a dev has to know about all the edge cases since this is testing security critical code. This means every project requires someone to know Python, and know how to do crypto right in Python. Or do a lot o FFI binding or shelling out. In any case, the means every project has a hard Python dependency. I think this is overly burdensome.

It also turns the testing into a black box which I think isn't great from a security standpoint. I think it's better to have the spec say "what" to do and use a list of test vectors as a bit of a "why." Explicitly stating what the vector needs to be gives the dev more knowledge on how to code it to the spec and what sort of edge cases need their own tests.

As an analogous case, if the TLS spec required every client/server pair to have a Python wrapper that did things like tell it how to generate bad handshakes or inject bad packets, nothing would officially meet the spec because I'm pretty sure most devs would say "we wrote the equivalent tests in our native language, and that's sufficient."

Could you just convert metadata in code in or called by Wrapper.set_up_repositories(), and send it to the server to be hosted normally for a client? Would that require much work?

That would infeasible for our Uptane / TUF internal implementation. Based off architecture, we'd have to code the ability for a couple of services to do Bad Things to allow this to happen. Since I don't think the conformance tester would give any benefits on top of the testing we already do, I don't think the PM would allocate time to implement it. I also would recommend to the team not to since I think there are other tests and security features that are more pressing. I think the only two test cases that we don't yet cover is slow data and metadata rollbacks. The former (while annoying) doesn't impact anything because the updates are async, and if there's a MITM, there's so much other badness they can do anyway (DNS cache poisoning, NTP spoofing).

Going back to what I said above, I'm the only person who has ever done any Python professionally on the engineering team, so we'd (probably?) end up having to do FFI bindings to Haskell integration tests which is just reeeaaaaally not something that sounds like it would be fun. Or ever get done.

Additionally, the tap7_wrapper_skeleton.py has ~300 lines of text explaining how to use it correctly. This is on top of the TAP itself. Given that we already have mistakes in how to implement the spec, I think it's likely we'd see as many mistakes in implementing the wrapper. I would much rather see the spec made stable and very clear and even a cursory pass at a list of suggested vectors before there is a mandatory conformance tester.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/theupdateframework/taps/issues/36#issuecomment-308374470, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0XDyHmzOx4BA_nBk6f6XWQ2o1TCnrRks5sD6XGgaJpZM4NsGha .

heartsucker commented 7 years ago

Do you feel it is worth having a well defined format for us to specify the tests instead of a more textual description?

I think it should be possible to have a YAML (or other human friendly format) that could be parsed to auto generate them. The number of permutations of things I've wanted to do in Python for my integration tests got unwieldy fast, and that was with the benefit of being able to do hacks in the code. Making a DSL that covers all possible cases might not be easy.

If the conformance tester only kept to the obvious cases, thus simplifying the markup, it should be straight forward. However, I think the main benefits of an automated tester would be in getting all the bizarre edge cases.

JustinCappos commented 7 years ago

Okay. One way to view your argument was that us standardizing on a python interface instead of the metadata format was a mistake.

I think your proposal doesn't stop us from making our reference tester available or making some code available to others that parses, etc. the metadata. However TAP 7 should perhaps focus on the metadata instead.

I would imagine there may be a small number of tests we cannot specify in the appropriate format / manner. Perhaps we can indicate those need to be coded manually, as we already do for slow retrieval...

On Wed, Jun 14, 2017 at 8:28 AM, heartsucker notifications@github.com wrote:

Do you feel it is worth having a well defined format for us to specify the tests instead of a more textual description?

I think it should be possible to have a YAML (or other human friendly format) that could be parsed to auto generate them. The number of permutations of things I've wanted to do in Python for my integration tests got unwieldy fast, and that was with the benefit of being able to do hacks in the code. Making a DSL that covers all possible cases might not be easy.

If the conformance tester only kept to the obvious cases, thus simplifying the markup, it should be straight forward. However, I think the main benefits of an automated tester would be in getting all the bizarre edge cases.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/theupdateframework/taps/issues/36#issuecomment-308415832, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0XD_9E8qf_awsDDm5YrpaPiVSJ4bo_ks5sD9HhgaJpZM4NsGha .

heartsucker commented 7 years ago

I had to start rewriting my test suite to support non-static repos (rollback testing, etc), and in doing so I incorporated test metadata generation.

Example. Edit: Example for Uptane.

In short, each test would be a repository going through a series of "steps" which would be:

The metadata format seems to be generic enough to allow for simple integration into existing TUF set ups and is future proof enough to be be independent of hashing, data format, key type, signing scheme, etc. Let me know what you guys think.

(Sorry, no docs yet, still actively working on this.)

awwad commented 7 years ago

TAP 7 is being updated to Version 2 after feedback: #55

vladimir-v-diaz commented 7 years ago

Note: Each pending TAP has a single corresponding issue tracker. Once a TAP is resolved, its issue tracker is closed.

I am closing this issue because TAP 7 has been resolved/rejected.