Awaiting threat model - Githubissues

bwbroersma commented 4 years ago

I'm missing an explicit threat model in the solution architecture.

Was a threat model created before writing the solution architecture?
If so, can this threat model be shared.

bwbroersma commented 4 years ago

@ijansch as I and others have asked you repeatedly in a Slack thread [1, 2, 3, 4, 5], who has worked / is working on the threat model? I expect you can answer this in your role as Technical architect in the team of external experts.

ijansch commented 4 years ago

Great question. I'll see if we can get our threat model online in a digestible form. (Good to know: given that we've only just started publishing documents: if it's not yet online doesn't mean it it's not yet addressed)

dirkx commented 4 years ago

Overigens - als je heel veel haast hebt - voor ons zijn https://eprint.iacr.org/2020/428.pdf en https://github.com/DP-3T/documents/blob/master/DP3T%20-%20Data%20Protection%20and%20Security.pdf zijn (net als in de andere landen om ons heen) de belangrijkste uitgangspunten. De versie van ons die nu door de interne processen loopt voegt daar alleen zaken als de BIO (https://www.informatiebeveiligingsdienst.nl/project/baseline-informatiebeveiliging-overheid/) en soortgelijke Nederland specifieke uitwerkingen aan toe. De actor analyse en dreigings analyse is niet wezenlijk anders.

bwbroersma commented 4 years ago

in a digestible form

Preferably in markdown :slightly_smiling_face: see this markdown wiki from a Google project for example. I also like the NIST approach in for example 800-63B Digital Identity Guidelines - Section 8, specifically their use of tables.

bwbroersma commented 4 years ago

@dirkx:

Both papers are mainly about the decentralized BLE protocol, and lack focus on the centralized backend. Both put trust in the honest stripping of IPs on the backend (or firewall), the DP^3T boldly claims:

The backend server that the data is transmitted to cannot link any of the infected EphIDs to natural persons.

Which is utter nonsense if the backend server does not behave properly and does store the IP-address. Especially since users will likely submit this data from home, since when an user is tested they should self quarantine in the intermediate period before knowing their result. An IP-address is considered a personal identifiable fact, especially if this concerns a fixed line internet connection, which provide dynamic IP-addresses which hardly ever are really dynamic. So if the user is connected via the home WiFi, the user's IP-address is bound to a household and a location.

We know our government agencies sometimes do not follow the law, for example the case of the license plate scanners of the Rotterdam Police. The first violation was storing license plates of no-hits, the second violation was the storing was too long. See the Dutch DPA article, and maybe even more frightening was the political response to change the law, because it was convenient anyway. Which is feature creep, now that the data or app is there, just change the law to make something legal that wasn't meant in the first place. We cannot trust one party to just delete information, because there is an endless list of 'not really deleted' stuff, take this example, abused by the Dutch Tax and Customs Administration (Belastingdienst). The same Belastingdienst didn't delete license plate data (2017). Again, we cannot always trust that our government agencies follow the law. The same Belastingdienst did more unlawful activities (2018) and undesired lawful activities (2018), all feature creep because the data is stored, even when if the data processor never intended nor wanted this use of the data.

dirkx commented 4 years ago

Yup; we (and the germans, etc) clocked that one too.

You'll have to hold your breath for a few day(s) as the papers on security, actor/thrt model and infra work their way through. But keep that thought. It is one of those things that must be fixed. Also, to a lesser extent, on the distribution/CDN end. And regardless of opinion - the GDPR (AVG) is crystal clear & sets the bar high.

Not surprisingly - an IP stripping measure is needed & added, with the required controls and SOPs / organisational measures.

bwbroersma commented 4 years ago

I would like to see an addition of an existing mix network, like I wrote in the @OpenState advise of 2020-04-14 and on Twitter as a technical measure to prevent possible government misbehavior in the future.

dirkx commented 4 years ago

Aye - also an option(though not that easy at this scale - we're talking significant numbers here).

Lets see what the DPIA assessment is going to looks like (which is a key next gate) -- as this is pretty core to that tradeoff. And exactly the sort of concern you need to document, address, mitigate or compensate & put controls in for.

Will add waiting label - and obviously - detailed implementation suggestions with t/a-models, dpia sketches and numbers are always welcome.

bwbroersma commented 4 years ago

not that easy at this scale - we're talking significant numbers here

Mix network primarily for uploading of course, downloading can be done via one or multiple CDN's, since downloading does not equal an infection it's of a lesser concern privacy wise.

willemdekker commented 4 years ago

Threat model Corona Traceer app.

De Corona traceer App brengt grote zorgen bij het publiek en de beveiligingsprofessionals over potentieel misbruik.

Om alle mogelijke risico’s en scenario's in beeld te brengen en accuraat in te schatten en daarop weer maatregelen te vinden is veel werk. Het grootste gevaar echter zijn blinde vlekken dus dat men bepaalde risico’s niet ziet, en daarop geen passende maatregelen heeft genomen.

Achteraf zijn veel beveiligings problemen redelijk voor de hand liggend, maar toch worden ze vooraf niet onderkend of de ernst word niet goed ingeschat, zodat niet de juiste maatregelen genomen worden.
Het volgende document gaat uit van het STRIDE model, zie https://docs.microsoft.com/en-us/azure/security/develop/threat-modeling-tool-threats en https://insights.sei.cmu.edu/sei_blog/2018/12/threat-modeling-12-available-methods.html.

De voorbeelden die in dit eerste hoofdstuk staan zijn slechts voorbeelden om de threats te illustreren. Er moet niet vanuit gegaan worden dit allemaal realistische bedreigingen zijn, waar tegen maatregelen genomen moeten worden.

Het systeem wordt opgesplitst in 6 onderdelen

1) Contact informatie bepalen. Interactie tussen de mobiele telefoons doormiddel van bluetooth of vergelijkebare technologie om contacten binnen een redelijke afstand te bepalen. Nu wordt voorgesteld om hier voor het GAEN (Google/Apple Exposure Notification) framework te gebruiken. Omdat dit generiek is, en niet voor de Nederlandse App specifiek, ligt de nadruk in dit document niet op dit gedeelte van de App.
2) Informatie over besmette contacten verkrijgen. Interactie tussen de app en een centrale server om een lijst van geanonimiseerde besmette contacten op te halen. 3) Het uitvoeren van een test. Interactie tussen de gebruiker , de app en de gezondheid autoriteiten bij het doen van de test. 4) Het doorgeven van een positieve test uitslag. Interactie tussen de gebruiker, de gezondheid autoriteiten, de centrale server en de app voor het doorgeven van een positieve test. 5) Installatie en onboarding. 6) Overige app/ gebruiker interacties.

Het STRIDE-model geeft de volgende categorieën Spoofing Tampering Non Repudiation Information Disclosure Denial of Service Elevation of Privilige

Category Description Spoofing Involves illegally accessing and then using another user's authentication information, such as username and password Tampering Involves the malicious modification of data. Examples include unauthorized changes made to persistent data, such as that held in a database, and the alteration of data as it flows between two computers over an open network, such as the Internet Repudiation Associated with users who deny performing an action without other parties having any way to prove otherwise—for example, a user performs an illegal operation in a system that lacks the ability to trace the prohibited operations. Non-Repudiation refers to the ability of a system to counter repudiation threats. For example, a user who purchases an item might have to sign for the item upon receipt. The vendor can then use the signed receipt as evidence that the user did receive the package Information Disclosure Involves the exposure of information to individuals who are not supposed to have access to it—for example, the ability of users to read a file that they were not granted access to, or the ability of an intruder to read data in transit between two computers Denial of Service Denial of service (DoS) attacks deny service to valid users—for example, by making a Web server temporarily unavailable or unusable. You must protect against certain types of DoS threats simply to improve system availability and reliability Elevation of Privilege An unprivileged user gains privileged access and thereby has sufficient access to compromise or destroy the entire system. Elevation of privilege threats include those situations in which an attacker has effectively penetrated all system defenses and become part of the trusted system itself, a dangerous situation indeed

Relevantie Matrix 1 2 3 4 5 6 S n n y y n n T y y y y y n R n n y y n n I y n y y n y D y y y y n n E n n y y y n

Conclusie de interacties 3 en 4 zijn de meest security relevante stukken, waarin bijna alle categorien mogelijk zijn omdat dit interacties zijn met geauthorizeerde gebruikers (van de test centra / GGD) er informatie gelekt kan worden.

Specifieke gevaren buiten de bovenstaande :

1) Privacy Schendingen

Bijvoorbeeld de anonimisatie. Men probeert de identiteit van personen de contacten te achterhalen en daarmee extra persoons gegevens te krijgen van de contacten. Dit kan richting de app zelf, waarin de status van de gebruiker wordt bepaald. Met voldoende extra data kan de identiteit van gebruikers worden bepaald.

Voorbeeld: Camera's registreren beelden van gebruikers van de corona app terwijl ze ook de bluetooth signalen opvangen. Hierdoor kan men in het tijds interval dat de signalen het zelfde zijn de gebruikers verder volgen en koppelen aan plaatje van de gebruiker. Ander voorbeeld extra traceer code is aanwezig binnen de telefoon (buiten de corona app) bijvoorbeeld als malware, die probeert de data van de corona app met de contacten gegevens te achterhalen en te koppelen aan gebruikers informatie zoals mobiele telefoon nummers en email adressen.

2) Misbruik van de app door de makers / beheerders van de app en de gezondheids authoriteiten. Dat is het gebruiken van de app voor andere toepassingen dan waar de gebruikers toestemming hebben gegeven en wettelijk mogen. Voorbeeld 1) de GGD gaat in de toekomst de corona app ook gebruiken voor andere besmettelijke zieken zoals griep. Terwijl dit niet is waar de gebruikers toestemming voor hebben gegeven. Voorbeeld 2) Terwijl de app makers claimen het veilige google/apple protocol te gebruiken wordt in werkelijkheid een ander protocol gebruikt dat niet veilig is.
Open source kan een gedeelte van deze zorg wegnemen maar als men wilt controleren of de gebruikte source code overeenkomt met de gepubliceerde vereist het gebruik van reproducible builds.

3) Gebruik van de app voor het traceren van niet zijnde corona patiënten door andere overheid en niet overheid instanties.
Voorbeeld:

Voorbeeld 1) De politie gebruikt de corona app om een alibi te controleren. De andere persoon waarbij de verdachte claimt te zijn geweest ten tijde van het misdrijf, wordt gevraagd zich als corona besmet op te geven. Als de verdachte dan geen 'corona alert' haalt dit zijn alibi onderuit. Voorbeeld 2) Een geheime dienst ontdekt een zwakheid in het google/apple protocol, houden dit geheim en gebruikt dit om spionnen te traceren maar de technologie die dit kan lekt uit en wordt daarna op grote schaal door prive detectives gebruikt, met een groot schandaal als gevolg.

In het verdere verloop als ook de architectuur van de app meer bekend is kan een gedetailleerder beeld gegeven worden van elk van de 6 punten van elk deel systeem. Daarnaast is het goed om te praten in termen van threat actors. Wat voor soort categorieën personen bestaan er die bepaalde threats willen uitvoeren. Wie zijn ze, wat zijn hun motivaties en wat zijn hun middelen, voeren ze hun aanvallen alleen op afstand uit of hebben ze ook fysiek toegang. Daarnaast is een ander handvat om te gaan zoeken naar vergelijkbare analyses en beveiliging problemen van andere vergelijkbare mobiele apps.

arianvp commented 4 years ago

Aye - also an option(though not that easy at this scale - we're talking significant numbers here).

I think https://www.torproject.org/ scales well enough for the purpose of uploads (which are rather small in size). I think the risk for downloads (well; lets see what the threat model says? ;)) is significantly smaller; but even 500kb/day is something that TOR can easily handle at scale

I think a mix network should seriously be considered. It sounds like a great idea to me.

bwbroersma commented 4 years ago

To add on the mix network: it would also be great to do the unwrapping of the signed message with RIVM token in a hidden service by a non-RIVM TTP, which only checks for the validity of the (first time use) token and then sends the inner key data (without the token) via an authenticated channel to the RIVM. This way the government really won't know that you use the app, unless the TTP and RIVM collude together. By client side encrypting also the inner key data with a RIVM public key, the TTP won't be able to see the inner data, so you could literally take an enemy of the RIVM here, as long as you trust it to cooperate in sending all data. _Maybe I should include a diagram on this :slightly_smilingface:

willemdekker commented 4 years ago

Ik heb een een pull request voor het threat model gemaakt.

jellelicht commented 4 years ago

Before folks hack the good hack in implementing a nice Tor-based solution for the IP stripping; Can anybody from the core team please ack/nack on if such a contribution would be: 1) acceptable 2) perhaps already in the works internally?

dirkx commented 4 years ago

Ad. 2 - it is most certainly NOT in the works.

Ad. 1 - the complexity to accepting it would probably center around proper DPIA analysis & the non-functionals.

Who can see what; would any nodes in the Tor network be a processor in the sense of the GDPR (and why not), etc, etc. This is already complex for the laywers in plain-old-IP space.

And secondly - in the non functionals; can a sizeable percentage of the NL population use such at scale - with sufficient guaranteed reliability in line with the economic importance of this for society. And without this resulting in a specialist, too much of a single vantage point, observer in that Tor network.

Because, as the code for this is quite easily added - I'd probably would focus on attacking above first. As that is the hard bit.

Finally - I am wondering if an optional option for those who want that would be possible that can be done as a compagnon app. As to sidestep a lot of the above.

As the endpoint / CDN details should be sufficiently static and well published (it is literally all open source; including details like that) for a local proxy or similar to narrowly intercept this.

bwbroersma commented 4 years ago

BTW I think we all propose to only use a mix network for the upload of infected keys and decoy messages.

would any nodes in the Tor network be a processor in the sense of the GDPR

All the data would be encrypted, so the only thing is that the guard node could distinctly tell a client IP is participating in the use of the app by seeing a package in a specific size range, either by sending a decoy message or an upload of infected keys. To combat this, we could have VWS run a few dedicated guard nodes, for these submissions only (e.g. size range). Passing around encrypted traffic without knowing the IP it originated from, is not a GDPR processor thing I think.

compagnon app

I don't think this would be possible nor wise. These are privacy-enhancing technologies (PET), that shouldn't be a non default, only to be received after downloading a 'privacy upgrade' app. To compare it with TLS: we no longer have a TLS-opt-in either (since we moved to HSTS and HTTPS-only), we want and expect security and privacy by default.

dirkx commented 4 years ago

Understood - but documenting this well from a DPIA perspective is almost harder than dropping in that little bit of code. Likewise for dedicated guard nodes & their AUP's.

jellelicht commented 4 years ago

Any indication when the first drips of DPIA-goodness will be shared? I understand that the code is not the hardest part, but it still seems like a strict improvement over the current solution (nothing).

willemdekker commented 4 years ago

I see a mix or tor network not as the first measure against government misbehavior. Personally I would first implement: a) Strict minimal data storage / data retention at the server side. No server side log files, no 'big data', no extra fields in the database etc, . b) Termination of TLS on the servers itself with extra care (no logging at that point of IP addresses) c) Simple web app and fallback to phone option for entering a positive result. Without logging either. No long lived cookies, no 3rd party javascript. d) Third party management of the CDN/servers, no access to the production servers for programmers, government personell or government contractors. e) Use of code reviews of all code in use and keep code open source.

If you do want to use a tor/mix network, I would make it first an optional feature since there may be some reliability and capacity concerns.

jellelicht commented 4 years ago

@willemdekker I agree with all of your points (except perhaps d). It's just that these are all mitigations and factors that lower the chance that someone keeps linkable records of my IP and uploads. Using a mix network + onion routing (forget that I mentioned Tor :wink: ) would make this fundamentally much harder to do. Your points a, b and e do not address the concern that a properly setup mix network + routing at least might address.

bwbroersma commented 4 years ago

About Tor reliability & capacity, it has quite some bandwidth and it's pretty old (17 years), is stable and literally battle tested :wink:

The bulk of the funding for Tor's development has come from the federal government of the United States, initially through the Office of Naval Research and DARPA.

There is also a nice (old) TorFlow visualization showing The Netherlands is pretty well connected to the majority of the Tor nodes.

ryanbnl commented 4 years ago

Don't forget the reporting process itself. The GGD here (gelderland midden) use the telephone which provides absolutely 0 privacy.
Have we look at the process for the validation code? I don't see the code online plus it's tricky to get right (if you check the code during the POST you give timing info, if you do it later you have the problem of typos).
There has to be some level of data stored purely for debugging purporses, if you store nothing then your devs are flying blind.

bwbroersma commented 4 years ago

I proposed a token process that would improve the privacy. Typos should definitely be handled by adding error detection and correction in the provided tokens.

dirkx commented 4 years ago

You may also want to have a good look at https://gitlab.com/PrivateTracer/caregiversportal/-/blob/master/AuthenticationCodes.md -- which has a rather good design for this.

ijansch commented 4 years ago

I just merged https://github.com/minvws/nl-covid19-notification-app-coordination/pull/28 - This issue can now be closed. I see some discussion here about the lab confirmation flows; maybe that should be moved to a separate issue? (or wait a bit, I expect us to publish the expected lab confirmation flow early next week).

minvws / nl-covid19-notification-app-coordination

Awaiting threat model #13