for #40: move lists_served to s3+dir

groovecoder commented 7 years ago

Updated Proposed System Diagram

shavar system diagram

@ckolos Here's a first start at how this could work with an s3+dir ...

The shavar.ini (in this case, shavar.testing.ini) lists_served item is changed from a newline-separated list of config sections, into an s3+dir url to a bucket that contains any number of .ini files/keys.

The shavar/lists.py module includeme config bootstrap hook is changed to connect to that s3 bucket, read all of its keys, create a SafeBrowsingList for each (using the same settings that came from the old single shavar.ini), and add the list to the registry of lists that shavar is serving.

I'm sure this will fail tests, and I really want to validate and/or clean up all the various .ini files that seem to be littered around in the repo. (As part of that, I may add the ability for lists_served in shavar.ini to point to a relative or absolute directory path - to make local dev & testing easier).

But this is the direction I'm heading. Looking good so far?

groovecoder commented 7 years ago

Alright, almost took the full week I estimated, but I've got latest changes in that should fix the tests.

groovecoder commented 7 years ago

Note: latest commit may remove too many comments about the .ini file values ...

groovecoder commented 7 years ago

@ckolos any chance to check this out? (Maybe even @rtilder?) I'd like some extra eyes on it to make sure I'm not screwing everything up.

To push this thru stage & prod, we will also need lots of coordination to move the lists' config values out of the existing single .ini file and into the per-list .ini files - either in directories, or in S3 bucket(s), depending on the environment.

ckolos commented 7 years ago

Looks good. The only suggestion I have is not making the list name dependent on the name of the .ini file, but that feels like a nit. I think the list configs should be stored in a gh repo. Once that's done, we can use jenkins to populate the 'list config bucket' as needed based on monitoring of the upstream.

groovecoder commented 7 years ago

So list configs in a GitHub repo that then make their way into an S3 bucket? Or should I change the code to grab the .ini content straight from GitHub?

ckolos commented 7 years ago

I think the first case. Let's keep the app as simple as this feature allows.

groovecoder commented 7 years ago

Finally back to this ... but here's a rough plan/idea ...

Create a new shavar-list-config (private?) repo (like the shavar-list-creation-config repo) with dev/, stage/, and prod/ directories
Fill the repo directories with appropriate .ini files based on what's generated in the .ini files by puppet-config/shavar for the different environments
Add copying of list config files from GitHub repo to S3 to the existing shavar-list-creation Jenkins job

Does that seem complete?

fmarier commented 7 years ago

Create a new shavar-list-config (private?) repo (like the shavar-list-creation-config repo) with dev/, stage/, and prod/ directories

Unless there's a good reason to keep it private, we should leave it public to increase the transparency of the list generation process.

ckolos commented 7 years ago

(blanket IMHO disclaimer)

I don't think puppet will be involved in the generation of the list configs. Bringing puppet into the mix will just perpetuate the requirement for a redeploy of the stack to generate new lists. This isn't what we want.
List configs should be either manually added/created to the bucket (this is insane, but workable) or done via a github trigger on jenkins. i.e. when a new config is committed to "the" repo, all configs are sync'd to the s3 bucket and the app takes care of the rest. This falls in line with the method through which the lists are generated in the first place.
A nit, but the new repo should be named something like shavar-server-list-config; the discussed functionality isn't specifically for the list-config, but for the -server config.
I don't think there's anything to be gained vis a vis transparency by exposing the application configuration serving the files generated from publicly available sources. If the wish to test new list functionality in prod is desired, it makes more sense to me to leave the specific list configuration out of the public view. For that reason, I would prefer to keep the configs private until this process is proven bullet proof and reviewed by cloud secops.

groovecoder commented 7 years ago

I'll try to start with a new private mozilla-services/shavar-server-list-config repo first, when it's working I'll file a review bug with cloud secops to make it public.

Re: puppet - I think I meant that I need to look at what's currently in puppet-config shavar.ini.erb to populate the new mozilla-services/shavar-server-list-config repo. When I've done that, we can remove the list config parts out of that puppet file to cut puppet out of the process. Right?

Then, I'll look at the Jenkins ShavarListCreationStage and ShavarListCreationProd jobs to add the GitHub-S3 copy step to them. As soon as I can regain access to Jenkins.

groovecoder commented 7 years ago

Created https://github.com/mozilla-services/shavar-server-list-config with stage/*.ini files.

@ckolos is going to start the ShavarServerConfigSyncStage job on Jenkins to copy the files from GitHub to S3.

When that's done, we should be able to deploy (this branch?) to stage to test it all.

groovecoder commented 7 years ago

Update: @ckolos created https://deploy.mozaws.net/view/Shavar%20Stage/job/ShavarServerConfigSync/

mozilla-services / shavar

for #40: move lists_served to s3+dir #85

Updated Proposed System Diagram