konklone / oversight.garden

Bringing together the oversight community's work.
https://oversight.garden
Creative Commons Zero v1.0 Universal
26 stars 9 forks source link

EC2 web tier #139

Closed divergentdave closed 8 years ago

divergentdave commented 8 years ago

Here's a running start at the new web tier. There's one big design question open, namely what to do with TLS secrets. On the one hand, AWS Certificate Manager provides certificates and management for free, though we could only use it in Elastic Load Balancing. If we go this route, then all related secrets are handled by Amazon, plus we could serve some static assets at the ELB layer. This would be particularly convenient for serving the sitemaps, which have to be generated on the scraper instance. On the other hand, we could use Lets Encrypt certificates and manage them ourselves. In planning for disposable instances, we will need to store certificates, private keys, account keys, configuration, and other state somewhere outside of the instance (probably S3) and copy them in and out when starting a new instance or renewing. This would use familiar tools, and avoid tying the infrastructure to another Amazon product, but I get the heebie jeebies thinking of uploading a private key to S3. Using a server side encrypted S3 bucket may mitigate this. Thoughts?

TODO:

konklone commented 8 years ago

Storing private keys in an S3 bucket could be okay if it's encrypted client side first.

But what about generating new LE certs during the instance creation process, and letting keys die when an instance is destroyed?

divergentdave commented 8 years ago

That could work, though I'm nervous about accidentally hitting the issuance rate limit. Plus I don't know what happens if you throw away your account key. I think we wouldn't be able to revoke anything without old private keys.

konklone commented 8 years ago

For reference, this is the last error in the elasticsearch log before its crash:

[2016-07-22 12:56:05,709][DEBUG][action.search.type       ] [Equinox] [oversight-20160428180035][0], node[WIsG-CcER9quFjuk_BNIlQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@7c2014dd]
org.elasticsearch.search.SearchParseException: [oversight-20160428180035][0]: from[2730],size[10]: Parse Failure [Failed to parse source [{"from":2730,"size":10,"query":{"filtered":{"query":{"query_string":{"query":["","","",""],"default_operator":"AND","use_dis_max":true,"fields":["text","title","summary","pdf.title","pdf.keywords","doc.title","docx.title","docx.keywords"]}},"filter":{"terms":{"inspector":["va?page=274","va?page=274","va?page=274","va"],"execution":"or"}}}},"sort":[{"published_on":"desc"}],"highlight":{"encoder":"html","pre_tags":["<b>"],"post_tags":["</b>"],"fields":{"*":{}},"order":"score","fragment_size":500},"_source":["report_id","year","inspector","agency","title","agency_name","url","landing_url","inspector_url","published_on","type","file_type","featured.author","featured.author_link","featured.description","unreleased","missing"]}]]
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:664)
    at org.elasticsearch.search.SearchService.createContext(SearchService.java:515)
    at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:487)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:328)
    at org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:308)
    at org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:305)
    at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [oversight-20160428180035] [query_string] query does not support [query]
    at org.elasticsearch.index.query.QueryStringQueryParser.parse(QueryStringQueryParser.java:127)
    at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:234)
    at org.elasticsearch.index.query.FilteredQueryParser.parse(FilteredQueryParser.java:71)
    at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:234)
    at org.elasticsearch.index.query.IndexQueryParserService.innerParse(IndexQueryParserService.java:342)
    at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:268)
    at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:263)
    at org.elasticsearch.search.query.QueryParseElement.parse(QueryParseElement.java:33)
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:648)
    ... 9 more
[2016-07-22 12:56:05,821][DEBUG][action.search.type       ] [Equinox] All shards failed for phase: [query_fetch]
divergentdave commented 8 years ago

That error's mostly benign, it's from query parsing. I reproduced it locally, and it didn't have any permanent effects. Opening another issue for followup.

divergentdave commented 8 years ago

Back on the topic of keys, when you say "encrypted client side," do you mean like one of these two options http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html or running it through gpg first with a key provided some other way?

konklone commented 8 years ago

That looks like some built-in ways in Amazon clients and services to do it -- that would work. You can also do your own encryption with your own code/tool, and manage the key oneself. Either way sounds totally fine to me, though I suppose if Amazon has some built-in methods those might be easier and more reliable.

divergentdave commented 8 years ago

For my reference: https://github.com/paul/letsencrypt-route53

divergentdave commented 8 years ago

It works! https://staging.oversight.garden/

konklone commented 8 years ago

Oh, hell yes. This looks fantastic. Would you say this is no longer a WIP, and ready for review/merge?

divergentdave commented 8 years ago

I just fixed a few more things, it's ready for review now

konklone commented 8 years ago

:+1: from me.

divergentdave commented 8 years ago

Thanks, merging! Cutover will consist of just updating the DNS records. For now, all DNS changes have to be done manually, but we could script this through route53 in future work.

(Note that https://staging.oversight.garden/dashboard is currently empty, but it will populate once the DNS is cut over. Right now the new scraper is trying to submit dashboard data to the old server, but the shared keys don't match.)