konklone / oversight.garden

Bringing together the oversight community's work.
https://oversight.garden
Creative Commons Zero v1.0 Universal
26 stars 9 forks source link

EC2 scraper instance #137

Closed divergentdave closed 8 years ago

divergentdave commented 8 years ago

This is part of #66. Haven't tested it yet, but it should hit all the high parts!

konklone commented 8 years ago

I'm cool with this, and so glad you're making progress on it. My one wish (and it's not a blocker) is to avoid relying on system Python, Ruby, and Node, rather than user-space language managers that stay current with their communities.

Thank you for tackling this, and let me know how I can support the work.

divergentdave commented 8 years ago

Thanks for the guide, I'll take a look once I get this working.

divergentdave commented 8 years ago

FWIW I ran into this issue: aws/aws-sdk-ruby#859, going to add a hardcoded delay to smooth over the race condition.

divergentdave commented 8 years ago

I got most of the configuration working, next up tomorrow is figuring out authorization for the AWS managed elasticsearch.

    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
    Er what!!
    Error: Authorization Exception
{"Message":"User: anonymous is not authorized to perform: es:ESHttpGet on resource: oversight"}
divergentdave commented 8 years ago

This is coming along pretty well so far, most things on the scraper instance are in working order. My next plan is to make a private S3 bucket with a script in it to add secrets to the configuration file, (Slack, dashboard, etc.) and then grab and invoke the script during setup.

divergentdave commented 8 years ago

Also, I'm going to do further research on AWS ES authentication libraries. I'm not happy with the one I have in there now, as it requires extra callbacks everywhere.

divergentdave commented 8 years ago

I've been taking a look at rbenv/pyenv/nvm, and while all the instructions are for adding these tools to .bashrc, that doesn't apply to non-interactive scripts such as the user-data script or the crontab. We already use the below snippet for rbenv, @konklone do you have corresponding snippets for nvm and pyenv? Any other advice for handling this?

export PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH
eval "$(rbenv init -)"
divergentdave commented 8 years ago

I think I have rbenv/pyenv/nvm working now, waiting for the cron job to run to confirm.

divergentdave commented 8 years ago

Alright, this works now! Ready for review, and I'll follow up with the web server elsewhere.

konklone commented 8 years ago

I'll work on reviewing this this weekend. Thank you for doing all of this!

konklone commented 8 years ago

Ugh, I'm clearly not prioritizing review here, and don't see a path to reviewing the actual deployment over the next two weeks. I've 👀'd the code, and it looks fine, and I think I just need to :+1: you rolling this out to see how it works in practice. Are you up for that?

Thank you so much for doing all this great (and very necessary) configuration management work.

divergentdave commented 8 years ago

Sure, sounds good. I left the scraper instance running, and it appears to be chugging along just fine.