ukhsa-collaboration / COVID-19-app-Android-BETA

Source code of the Beta of the NHS COVID-19 Android app
https://covid19.nhs.uk/
MIT License
775 stars 149 forks source link

Secret keys are generated externally #14

Closed mcb30 closed 4 years ago

mcb30 commented 4 years ago

Describe the bug From examining https://github.com/nhsx/COVID-19-app-Android-BETA/blob/master/app/src/main/java/uk/nhs/nhsx/sonar/android/app/registration/ResidentApi.kt#L46 it appears that the app relies on an external HTTP server to generate a key pair, which it then stores:

fun confirmDevice(deviceConfirmation: DeviceConfirmation): Promise<Registration> {
    ....
    val request = HttpRequest(POST, "$baseUrl/api/devices", requestJson)

    return httpClient
        .send(request)
        .map { json: JSONObject ->
            val key = json.getString("secretKey")
            val publicKey = json.getString("publicKey")
            ...
            keyStorage.storeServerPublicKey(publicKey)
            keyStorage.storeSecretKey(key)
            ...
        }
}

Since this key pair was generated by an external service the "private" key is, by definition, not private.

This implementation flaw is separate from the basic design flaw of any centralised approach to contact tracing. The basic design flaw allows a government to trace the movements and meetings of its citizens. This implementation flaw additionally allows the government to forge records of such movements and meetings, and to create valid digital signatures for the forged records.

mcb30 commented 4 years ago

The app should instead be generating a key pair locally. It should then create a signature over its own public key using the corresponding private key, and then communicate only the public key and the signature (to prove ownership of the corresponding private key). The private key itself should never leave the device. See, for example, the way that SubjectPublicKeyInfo is signed in an X.509 certificate, or the way that a SeedReport is signed in the decentralised contact-tracing CX protocol (https://github.com/ipxe/cx/releases/download/v0.10.1/cx.pdf section 5.3).

r3m0t commented 4 years ago

This matches the design here - https://www.ncsc.gov.uk/files/NHS-app-security-paper%20V0.1.pdf

"Device contacts the registration service with the activation code, which returns the InstallationID and symmetric key."

The symmetric key is called the secret key in the code, so please review the PDF to understand how it's used and why it is server-generated.

calfr commented 4 years ago

Reading through what documentation we have, and through the codebase itself of the application, it seems like this "Secret Key" is the key used to create the HMAC signature sent as part of the Bluetooth transmissions, not a private key within a Public-Private keypair. This is most likely the "Symmetric Key" detailed in https://www.ncsc.gov.uk/files/NHS-app-security-paper%20V0.1.pdf The following are the handful of uses I could find for the key - do let me know if there's any I've missed:

https://github.com/nhsx/COVID-19-app-Android-BETA/blob/43a167f8dba422fd9001b64f9c4fd82275abb1c8/app/src/main/java/uk/nhs/nhsx/sonar/android/app/crypto/BluetoothIdSigner.kt#L11-L24 https://github.com/nhsx/COVID-19-app-Android-BETA/blob/43a167f8dba422fd9001b64f9c4fd82275abb1c8/app/src/main/java/uk/nhs/nhsx/sonar/android/app/referencecode/ReferenceCodeApi.kt#L15-L31 https://github.com/nhsx/COVID-19-app-Android-BETA/blob/43a167f8dba422fd9001b64f9c4fd82275abb1c8/app/src/main/java/uk/nhs/nhsx/sonar/android/app/diagnose/review/CoLocationApi.kt#L16-L33

A HMAC key needs to be known by both parties in order to verify messages using it - and I assume they decided assigning a valid one for each user would be less risky than having a user provide their own initially. There may be other issues here - but the transfer of the private part of a keypair without any protection doesn't seem to be one of them

mcb30 commented 4 years ago

Reading through what documentation we have, and through the codebase itself of the application, it seems like this "Secret Key" is the key used to create the HMAC signature sent as part of the Bluetooth transmissions, not a private key within a Public-Private keypair. This is most likely the "Symmetric Key" detailed in https://www.ncsc.gov.uk/files/NHS-app-security-paper%20V0.1.pdf

You're right, but this makes the issue worse than initially reported. There seems to be nothing within the protocol that allows for verification that the Bluetooth message was originally generated by the device itself rather than by something with access to the central server's database and hence access to the "Symmetric Key".

The message content proves ownership of the key pair Priv_Device,t and Pub_Device,t used to generate the message, but the only guarantee that this key pair was actually generated by the device corresponding to InstallationID comes from the HMAC signature by the "Symmetric Key". Since the central server knows the value of "Symmetric Key", it can produce validly signed messages for a forged key pair Priv_Device,forged and Pub_Device,forged.

The ability for the central authority to forge validly signed Bluetooth messages remains. You have shown that fixing this issue will require a wire-level change to the Bluetooth message format, rather than requiring only a change to the registration HTTP API.

mcb30 commented 4 years ago

This assumes that there are no constraints on the ephemeral key pairs chosen by the device. The code that generates them in EphemeralKeyProvider accepts no outside parameters, so it seems safe to assume that it is generating completely arbitrary keys.

marksteward commented 4 years ago

The messages are authenticated for the benefit of the central server. What would be the use of that server faking its own data?

mcb30 commented 4 years ago

The messages are authenticated for the benefit of the central server. What would be the use of that server faking its own data?

The threat model is not the central server faking data for its own consumption. The threat model is a malicious attacker (e.g. a criminal or state actor) who gains read access to the central server's database.

As a relatively benign example: suppose that I compromise the central server and thereby gain access to your Installation ID and Symmetric Key. I can then impersonate you. I can wander around transmitting Bluetooth messages that will be recorded by other app users. The central server will treat every single one of my forged messages as legitimately having been sent by you. There is no way for it to know otherwise.

As a slightly less benign example: suppose that I am a state actor wishing to forge an evidence trail for a suspect I wish to frame for a crime. I access the database and then do exactly the same thing: I transmit Bluetooth messages in the location of my choosing. I later use the digitally signed messages recorded by other app users as evidence of the suspect's location at trial.

A general principle is that a compromise of an authentication token database should not allow for impersonation using the compromised tokens. This is why, for example, password databases store hashed passwords rather than plaintext passwords. Knowledge of the hashed password allows you to verify the password, but does not (directly) help you to discover the plaintext password itself.

It's the same principle at work here. The central server requires only the knowledge of how to verify the received messages. In the current design (which chooses a shared symmetric key), it also has the knowledge of how to sign messages. This is the design flaw that leads to the attack.

marksteward commented 4 years ago

You're right that this makes it unsuitable as a forensic audit log.

mcb30 commented 4 years ago

You're right that this makes it unsuitable as a forensic audit log.

A tool such as this, if it ever gets sufficiently wide adoption, will be treated as though the data captured is trustworthy, and will inevitably see its data (ab)used in scenarios other than the original use case.

There are two options to solve this issue:

  1. Fix the protocol so that the central server never possesses the knowledge required to sign messages, as outlined in my original post.
  2. Fix the whole of society, the government, and the justice system so that they will understand the precise implications of a particular digital signature scheme, and will not treat the signatures as conveying any significance.

Option 1 is an irritating fix to have to make but is entirely possible, assuming that the protocol was designed in a way that allows for protocol versioning. (If it doesn't, then that's an even bigger problem.)

Option 2 is never going to happen.

kuro68k commented 4 years ago

I have to agree with mcb30 here, this creates an unacceptable long term risk to the public and is entirely, easily avoidable.

adarrel753 commented 4 years ago

Using a public/private key pair to transmit a symmetric key is fine in the sense that the transmission of the symmetric key is secure, how long the symmetric key is allowed to be used though is important. If the permitted session length is very short then this limits the damage that could be done by the key being discovered. That being said, from reading the documentation the symmetric key in question is intended to be used to authenticate the device.

Another approach perhaps could be that during app registration the device generates a random series of characters (i.e. a strong password) and submits it to the server in plaintext, which stores a hash of it. When the device then sends future submissions to the server it includes its plaintext password along with them. The server then generates a hash of the received plaintext and compares it with the hash it has stored. This process could be described as the device performing a silent login. Since (unlike a cipher) a hash cannot be reversed back to its plaintext, knowledge of the hash could not be used to manufacture / impersonate a user submission like the way described in the threat model stated in the earlier post.

marksteward commented 4 years ago

@mcb30 it's explicitly described in the documentation as a symmetric key. Someone would need to read only particular lines of code out of context to believe the server doesn't know it. In other protocols, a shared HMAC is used to claim plausible deniability.

It's also not as simple a change as it sounds. The server still needs to authenticate messages, so a signature will add 512 bits to the 1024-bit payload. To prevent an attacker reusing a signature, BV would need to be re-signed more frequently than every 24 hours, which consumes battery. The new public key must be protected to prevent tracking, but must also be revealed to dispute the forged signatures. And this still doesn't cover what happens when it's uninstalled, or the more likely attack of rebroadcasting messages in realtime.

The risk to the public from abusing data should be tackled by legislation, as demanded by JCHR https://committees.parliament.uk/committee/93/human-rights-joint-committee/news/146351/report-on-the-contact-tracing-app-published/

Xenoamor commented 4 years ago

Shouldn't the device generate the private/public key locally and issue the public key back to the server on registration? I'm not sure I follow why the server must have access to that private key.

We can't even confirm if the key being supplied from the server is cryptographically secure let alone physically secure

marksteward commented 4 years ago

@Xenoamor it's not a keypair, but a shared secret, effectively being used by the server to send itself messages via the bluetooth broadcasts received on people's phones. As a result, the server is trusting devices, not the other way around.

adarrel753 commented 4 years ago

The threat model is not the central server faking data for its own consumption. The threat model is a malicious attacker (e.g. a criminal or state actor) who gains read access to the central server's database.

As a relatively benign example: suppose that I compromise the central server and thereby gain access to your Installation ID and Symmetric Key. I can then impersonate you. I can wander around transmitting Bluetooth messages that will be recorded by other app users. The central server will treat every single one of my forged messages as legitimately having been sent by you. There is no way for it to know otherwise.

I am going to raise this point at https://hackerone.com/nhscovid19app. Would you mind contributing there too? I ask because although understood your point generally, to be honest you illustrated the technical details better than I can.

mcb30 commented 4 years ago

@mcb30 it's explicitly described in the documentation as a symmetric key. Someone would need to read only particular lines of code out of context to believe the server doesn't know it. In other protocols, a shared HMAC is used to claim plausible deniability.

@marksteward Then let's see an official government statement acknowledging in writing that the protocol is explicitly designed to allow message forgery (which is precisely the same thing as "plausible deniability"), and that any recorded messages have zero evidentiary value and are inadmissible in court.

It's also not as simple a change as it sounds. The server still needs to authenticate messages, so a signature will add 512 bits to the 1024-bit payload. To prevent an attacker reusing a signature, BV would need to be re-signed more frequently than every 24 hours, which consumes battery. The new public key must be protected to prevent tracking, but must also be revealed to dispute the forged signatures.

Then choose a signature scheme that doesn't require the ability to verify the message signature before it becomes necessary to do so!

CX solves the equivalent problem by having the transmitting device retain its secret until an explicit disclosure event takes place (e.g. an infected person choosing to disclose the RNG seed from which their messages were generated), by deriving the initial secret in a way that allows for proof of ownership, and by ceasing transmission from that secret as soon as disclosure takes place. Read sections 4 and 5 of the specification document for the implementation details.

There is simply no need for the server to be able to verify messages prior to the disclosure event. You don't need to (and should not) choose a signature scheme that requires this unnecessary ability.

The risk to the public from abusing data should be tackled by legislation, as demanded by JCHR https://committees.parliament.uk/committee/93/human-rights-joint-committee/news/146351/report-on-the-contact-tracing-app-published/

I don't disagree that legislative protection is important, but that doesn't remove the need for technical protection. We have the Computer Misuse Act that makes unauthorised access a criminal offence and therefore gives us legislative protection, but we still use passwords.

mcb30 commented 4 years ago

I am going to raise this point at https://hackerone.com/nhscovid19app. Would you mind contributing there too? I ask because although understood your point generally, to be honest you illustrated the technical details better than I can.

@adarrel753 Sure; just CC me on whatever you submit. You can find my e-mail address in many git commit logs :slightly_smiling_face:

adarrel753 commented 4 years ago

I am going to raise this point at https://hackerone.com/nhscovid19app. Would you mind contributing there too? I ask because although understood your point generally, to be honest you illustrated the technical details better than I can.

@adarrel753 Sure; just CC me on whatever you submit. You can find my e-mail address in many git commit logs

Nice one. Here you go: https://hackerone.com/reports/870145

mcb30 commented 4 years ago

Nice one. Here you go: https://hackerone.com/reports/870145

Thank you. I get an "Oops! You can't access this report because it isn't public yet" error. Could you add me to the authorised user list (or whatever action is required by HackerOne)? Thanks.

kuro68k commented 4 years ago

Please keep this thread updated. You can't read the other site without logging in.

mcb30 commented 4 years ago

Haven't heard anything yet to suggest that NHSX is planning to fix this security vulnerability.

kuro68k commented 4 years ago

Sounds like most of it is being canned in favour of moving to the Google/Apple system, that's probably why.

Xenoamor commented 4 years ago

I've not heard news of that. Was this in the roadmap leak from yesterday?

kuro68k commented 4 years ago

It's been reported, e.g. today: https://www.theguardian.com/world/2020/may/14/nhs-coronavirus-advisory-board-split-over-ditching-government-app

Basic0 commented 4 years ago

Lots of silence around tihs fairly fundamental design flaw, especially considering the timescales in question...

adarrel753 commented 4 years ago

I don't think it's a big problem, most database systems I have worked with have several layers of security in place anyway, so that one part can be taken down temporarily for maintenance without jeopardising the whole system.

Basic0 commented 4 years ago

By that argument, why not store everything in plaintext?

adarrel753 commented 4 years ago

That would be a much bigger problem. Lol

mcb30 commented 4 years ago

I don't think it's a big problem, most database systems I have worked with have several layers of security in place anyway, so that one part can be taken down temporarily for maintenance without jeopardising the whole system.

This is the only layer of security in terms of asserting authenticity of the messages, and it's broken by design.

mcb30 commented 4 years ago

I just dont see how this security flaw represents a serious risk to its end-users

I already described the threat in my comment above: https://github.com/nhsx/COVID-19-app-Android-BETA/issues/14#issuecomment-626193757

marksteward commented 4 years ago

Lots of silence around tihs fairly fundamental design flaw, especially considering the timescales in question...

It's not a flaw, @mcb30 is conflating the design of a decentralised protocol (like CX) with a centralised one. In particular:

There is simply no need for the server to be able to verify messages prior to the disclosure event. You don't need to (and should not) choose a signature scheme that requires this unnecessary ability.

In a centralised model, proximity events are uploaded on disclosure. If the server can't verify these on its own, it must reach out to each contact, stalling the notification cascade that's a central goal of this design.

As an aside, the CX protocol allows anyone (including authorities) to forge proximity messages after disclosure. It could fix this by including a signature in every broadcast, but I suspect it's also by design.

mcb30 commented 4 years ago

As an aside, the CX protocol allows anyone (including authorities) to forge proximity messages after disclosure. It could fix this by including a signature in every broadcast, but I suspect it's also by design.

@marksteward I think you've misunderstood what "disclosure" means. It's not the broadcast of the Bluetooth packet: it's the decision to disclose some information that was previously kept secret (the Seed Value, in the case of CX). Section 5 of the CX spec (https://github.com/ipxe/cx/releases/download/v0.10.1/cx.pdf) clearly states:

The act of sending a Seed Report is a deliberate disclosure of a set of Preseed Values and Seed Values. Once the Seed Report has been sent, the included Preseed Values (and the derived Seed Values) are no longer private to the Advertiser. An Advertiser MUST NOT transmit any Contact Identifiers generated by a Generator instance once the Seed Report corresponding to that Generator’s Seed Value has been sent. An Advertiser SHOULD immediately construct a new Seed Value as described in section 4 and continue by transmitting Contact Identifiers generated from the new Seed Value.

Once this disclosure has taken place (i.e. once anyone other than the device owner knows the Seed Value) then the Seed Value is exhausted: any further transmissions derived from that Seed Value will be ignored by all devices.

Xenoamor commented 4 years ago

I like the idea of the seed value, it would mean you're basically unknown to the decentralised system until you reveal the value. Doesn't the current system issue you a value every day or something so you're identifiable in other people's data before reporting (to the decentralised server)?

I don't see why this can't be used with a decentralised system. The user still has to self report symptoms/test results and the seed can be shared at that point. Seems near identical to how it currently works

edent commented 4 years ago

For those that haven't already seen it, there's an NCSC blog post about the security questions raised.

https://www.ncsc.gov.uk/blog-post/nhs-covid-19-app-security-two-weeks-on

mcb30 commented 4 years ago

@edent Thanks for the link. Good to see that this problem is at least recognised in the report at https://github.com/vteague/contactTracing/blob/master/blog/2020-05-19UKContactTracing.md#InstallationID-HMAC

The report then says that "NCSC have confirmed that there were existing plans to refine the registration process". Do you have any information about the new registration process that will be used?

sam-jaques commented 4 years ago

The general logic that MACs are plausibly deniable is sound, but I still don't think the server needs to know the signing key.

Unless I've misunderstood: The purpose of signing the bluetooth transmissions is to prevent replay attacks. The harms of a replay attack seem to mostly fall on the user: they may falsely believe they have been in proximity to someone with COVID-19. Hence, only the user needs to know the HMAC signing key. The server can send them any proximity events that matches their ID, along with the associated MACs; the user's app then checks the MACs, and ignores them if they don't match their MAC signing key. If the app generates a fresh MAC signing key every 24 hours, and keeps 2 weeks of these keys, then it provides authenticity to the user while also maintaining complete deniability: since no one but the user's app knows the MAC key, then the MAC itself is meaningless noise once the user deletes the key.

Why does the server need to authenticate proximity events? Besides a DDoS attack on the user by flooding the the server with fake proximity events, which it will forward to the user's app, which must process and discard them.

This also means that a contact event is deniable, even to the server, unless the user decides to admit that a contact event is genuine. This adds another type of privacy.

edent commented 4 years ago

I'm pasting this message in every active GitHub issue, so you may receive duplicate notifications.

Today, I'm happy to announce that NHSX has released the full git commit history for the Isle of Wight Beta apps.

As discussed, we have redacted API keys, sensitive domain names, and some of the developers' personal details. I am still waiting on final approval to publish the server-side code.

I would like to personally thank the community for your comments, bug reports, and vulnerability disclosures. They all went into helping the development process.

The beta trial of this app has now ended and we've moved to the next phase of app development. It is our intention to publish the source code of future apps as the binaries are released to the public.

Once again, thank you for being part of this.

Terence Eden Head of Open Technology - NHSX