gratipay / gratipay.com

Here lieth a pioneer in open source sustainability. RIP
https://gratipay.news/the-end-cbfba8f50981
MIT License
1.12k stars 308 forks source link

build a vault #3504

Closed chadwhitacre closed 8 years ago

chadwhitacre commented 9 years ago

We are going to start storing national identification numbers (https://github.com/gratipay/gratipay.com/issues/3289#issuecomment-107100341) as well as bank account numbers (#3377 downstream of #3366). We need a vault separate from our main application and database that is more highly secure. We should use the PCI DSS 3.0 standard to self-assess the security of our application (https://github.com/gratipay/inside.gratipay.com/issues/214). This ticket is about building a new vault component of our architecture.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

chadwhitacre commented 9 years ago

I think we should host our vault directly on AWS, since they clearly offer a PCI compliant environment, whereas Heroku doesn't advertise as much.

http://aws.amazon.com/compliance/pci-dss-level-1-faqs/

chadwhitacre commented 9 years ago

I'm envisioning a very simple key/value store, an expansion of vault.py to put it on the network. I suppose the thing to do would be to use HTTP so we can post into it from javascript. We don't want to transmit sensitive data through the main web app at all.

chadwhitacre commented 9 years ago

Let's do some poking around ...

https://hashicorp.com/blog/vault.html

What else?

chadwhitacre commented 9 years ago

http://tokenator.org/

chadwhitacre commented 9 years ago

https://github.com/SimplyTapp/Tokenator

chadwhitacre commented 9 years ago
Data Encryption

In addition to being able to store secrets, Vault can be used to encrypt/decrypt data that is stored elsewhere. The primary use of this is to allow applications to encrypt their data while still storing it in the primary data store.

The benefit of this is that developers do not need to worry about how to properly encrypt data. The responsibility of encryption is on Vault and the security team managing it, and developers just encrypt/decrypt data as needed.

chadwhitacre commented 9 years ago

One key feature of our requirements here is that the web app only needs to write secrets, not read them. It's the payroll process that needs to read secrets, in order to originate ACH credits and populate invoices. My thought is that we should use public key cryptography, with the web app holding the public key (via heroku config:set) and the payroll process having access to the private key.

Introducing a server component, whether Vault or something else (including something DIY), increases our surface area and level of complexity significantly moreso than integrating encryption-before-storage into our existing application architecture. What are the PCI implications of the latter?

chadwhitacre commented 9 years ago

Another design requirement: I want separate access groups for the main web app and the PCI vault. I want to be able to grant access to Heroku (app + db) as we've been doing, which is carefully, to be sure ... but we need to be even more careful with access to vaulted data.

chadwhitacre commented 9 years ago

Let's distinguish the three pieces of information we're intending to collect, their risk profile, and our immediate application requirements regarding each.

piece of information risk write read—process, role, purpose
bank account number (BAN) financial theft web payroll, Gratipay, generation of NACHA files to submit for ACH origination
individual national identification number (NIN) personal identity theft web web, team owners, filling out tax forms
business identification number (VAT/EIN) business identity theft web web, supporters and team owners, generation of invoices
chadwhitacre commented 9 years ago

So the web app does need to read some secrets.

chadwhitacre commented 9 years ago

Meaning it does come under the systems we need to consider in terms of PCI compliance.

chadwhitacre commented 9 years ago

The requirement for invoices is that VAT be available to both supporters (buyers; https://github.com/gratipay/gratipay.com/issues/1199#issuecomment-24576143) and teams (sellers; https://github.com/gratipay/gratipay.com/issues/1199#issuecomment-67562705).

chadwhitacre commented 9 years ago

Hashi Vault supports dynamic secrets. Could we use that to ensure that access to Heroku doesn't entail access to our vault?

chadwhitacre commented 9 years ago

Dynamic Secrets: Vault can generate secrets on-demand for some systems, such as AWS or SQL databases. For example, when an application needs to access an S3 bucket, it asks Vault for credentials, and Vault will generate an AWS keypair with valid permissions on demand. After creating these dynamic secrets, Vault will also automatically revoke them after the lease is up.

http://vaultproject.io/intro/

chadwhitacre commented 9 years ago

Like, when the app spins up, it asks our vault for credentials to our vault?

chadwhitacre commented 9 years ago

Looks like that would take some work.

chadwhitacre commented 9 years ago

I'm going through the Vault intro.

chadwhitacre commented 9 years ago

Alright, I am introduced to Vault. It's a nice piece of software. We very well may be able to use it here.

chadwhitacre commented 9 years ago

I want to give people access to a web app (at Heroku, as it happens) that has access to Vault, without giving the people the same access to Vault as the web app has. This could be achieved with a vault secret backend that supported dynamic secrets, yes?

https://github.com/hashicorp/vault/issues/288

chadwhitacre commented 9 years ago

I've registered for an AWS account.

chadwhitacre commented 9 years ago

Can we use the browser as the go-between to avoid leaking vault access to people with Heroku access?

chadwhitacre commented 9 years ago

I don't see how to meet this requirement with Vault. :(

chadwhitacre commented 9 years ago

Or at all, really. If the web app has to be able to write, then whoever has access to the web app could potentially write out their bank account details and collect all of payroll for a week.

chadwhitacre commented 9 years ago

Okay, so let's take it that we don't have a separate access tier that is even tighter than access to our production hosting environment and database.

Then we're back up against the fact that Heroku does not promise a PCI-compliant environment to nearly the extent that Amazon does.

chadwhitacre commented 9 years ago

Gosh. Are we talking about migrating away from Heroku? :mouse:

Are your datacenters certified / PCI compliant?

All of our datacenters have been certified by national and/or international security standards.

Our NYC1 facility is SSAE16 SOC-1 Type II certified. Our NYC2 facility is SSAE16 SOC-2 Type II certified. Our NYC3 facility is SSAE16 SOC-2 and SOC-3 compliant. Our AMS1 and AMS2 facilities are ISO27001:2005 and ISO9001 certified. Our AMS3 facility is ISO9001, ISO27001, and SSAE16 Type II certified Our SFO1 facility is SSAE16 SOC-1 Type II certified. Our SGP1 facility is ISO27001:2005 certified. Our LON1 facility is ISO9001:2008, ISO27001, and SSAE16 / ISAE 3402 certified. Our FRA1 facility is ISO9001:2008, ISO27001:2005, and ISO22301:2012 certified.

https://www.digitalocean.com/help/policy/

via https://www.digitalocean.com/community/questions/digital-ocean-pci-dss-server-compliance

chadwhitacre commented 9 years ago

Amazon > DO > Heroku (PCI-wise)

chadwhitacre commented 9 years ago

Okay! Reticketed as #3505. :swimmer:

chadwhitacre commented 9 years ago

Well, reopening because it still might make sense to separate out the vault from the main db.

chadwhitacre commented 9 years ago

See https://github.com/gratipay/inside.gratipay.com/issues/223 for overarching discussion about infosec risk management.

chadwhitacre commented 9 years ago

Here's the page listing Vault storage backends. Looks like the only real option for us is Consul. They recommend 3-5 nodes per data center, and common practice with AWS is to run with at least two data centers (availability zones [AZ] in AWS-lingo).

chadwhitacre commented 9 years ago

Could we get away with one node and an EBS volume?

chadwhitacre commented 9 years ago

"AWS Tips I Wish I'd Known Before I Started"

A collection of random tips for Amazon Web Services (AWS) that I wish I'd been told a few years ago, based on what I've learned by building and deploying various applications on AWS.

chadwhitacre commented 9 years ago

How about two AZs, with one EC2 instance each, running both Vault and Consul + one EBS volume?

techtonik commented 9 years ago

I'd really really like to outsource the task of private information management to trusted parties, which could be security and privacy guarantees of our clients.

chadwhitacre commented 9 years ago

@techtonik Please make a concrete suggestion for how to do that in our case.

techtonik commented 9 years ago

Ok. The suggestion is that we only have to keep records like this:

Then we need to query the bank requisites through OAuth or give the privacyservice1 a command "do transaction with ... on behalf of our client that is recorded as privacyservice1:id on your system".

I'd really avoid storing all the private info on GP, because then it will become an easy attack vector automatically.

chadwhitacre commented 9 years ago

@techtonik What are some examples of companies we could use for privacyservice1?

techtonik commented 9 years ago

@whit537 I think that lawyers should know about privacy services, so they should be aware for privacy services in banking industry as well.

chadwhitacre commented 9 years ago

@techtonik That's a cop out. Go find us some privacy services to talk to if you want us to pursue that possibility.

techtonik commented 9 years ago

@whit537 we do have a lawyer that we pay, don't we?

chadwhitacre commented 9 years ago

@techtonik That's a cop out. Go find us some privacy services to talk to if you want us to pursue that possibility.

techtonik commented 9 years ago

@whit537 is it possible to ask him directly first? I don't even know how these services are called in English, leave alone US specific terms.

chadwhitacre commented 9 years ago

@techtonik Can you link us to one example of such a service? Doesn't have to be in English, I just have no idea what you're talking about right now. :-(

techtonik commented 9 years ago

@whit537 I don't know. Some advanced banks have services that help to preserve user privacy. They can issue anonymous or one-time debit card, for example, and may exhibit a OAuth management API that can also hide the identity. This decreases the risk that user data will be stolen and reused by malicious party. Quick search - https://www.privacyworld.com/5mastercard.html

chadwhitacre commented 9 years ago

@techtonik I'm not sure what to do with that side. In any case, we were rejected by Citizens (#3366), so we're no longer trying to build our own direct ACH integration and we don't need to store bank numbers. We're going to try Zipmark instead (#3491). Closing ... for now(?!).

chadwhitacre commented 9 years ago

Zipmark didn't work out, and it turns out we want to do strong idv for employment reasons, not just AML reasons.

chadwhitacre commented 9 years ago

Blog post on Balanced's architecture:

http://blog.balancedpayments.com/balanceds-architecture/

knox, midlr, and js are all on their own Amazon account. Only a subset of our staff has access to this: I personally wouldn’t even know how to get into those servers. precog, api, and router are all on an Amazon account which most of our developers have access to, and that’s where most of the actual work in building new features goes.

chadwhitacre commented 8 years ago

"best practices for storing ssn"

chadwhitacre commented 8 years ago

I would look towards HIPAA de-identification guidelines on protected data from HHS.

https://www.reddit.com/r/AskNetsec/comments/2pswf5/securely_storing_ssn_details/cmzyxno

Haven't considered HIPAA before.

kzisme commented 8 years ago

http://www.heinz.cmu.edu/~acquisti/ssnstudy/ On Nov 28, 2015 12:55 AM, "Chad Whitacre" notifications@github.com wrote:

"best practices for storing ssn https://www.google.com/search?q=best+practices+for+storing+ssn]

- http://stackoverflow.com/questions/254935/storing-social-security-numbers

https://community.spiceworks.com/topic/739855-best-practices-storing-personal-data-including-ssns

— Reply to this email directly or view it on GitHub https://github.com/gratipay/gratipay.com/issues/3504#issuecomment-160251283 .