Implement disable procedure for BigTent proxies

ozten commented 11 years ago

During the initial launch, or later during normal operations, it should be possible to disable a BigTent proxied service.

We'll use this bug to document process and create any missing artifacts.

This should work for

Disabling yahoo.com after initial launch
Disabling later launches like gmail.com while leaving yahoo.com running
Disabling a specific service at a later point in time
It should be easy to re-enable a service once the regression is fixed

ozten commented 11 years ago

An earlier suggestion was to remove Persona's idp_proxies config. This won't work, as we have a "recently seen primary" timeout:

We should ship an rpm of bigtent with server/views/well_known_browserid.ejs which has

{
    "disabled": true
}

which will cause browserid to show

for users whom had used the Yahoo proxy.

In order for this to work, the BigTent setting pub_key_ttl should be set to 0 at the launch. Once we gain confidence, we can bump this TTL up.

Note We don't touch Persona's config. It will still have proxy_idps with yahoo.com enabled.

Turning yahoo.com back on is easy! We just push an rpm which has a proper server/views/well_known_browserid.ejs. BrowserID will see support has returned and do the secondary -> primary transition.

ozten commented 11 years ago

We can do this a couple ways...

1) Provide a script which is run on each webhead to re-write server/views/well_known_browserid.ejs. 2) Provide a different branch, which is the same as the release train, but which has a disabled server/views/well_known_browserid.ejs. 3) ???

ozten commented 11 years ago

4) Add a new configuration element which disables the current bigtent service. This would cause the { "disabled": true} to be returned.

ozten commented 11 years ago

@lloyd What do you think of this proposal and specifically Option 4?

callahad commented 11 years ago

I like option 4 -- push a config change, bounce the bigtent instances, done.

ozten commented 11 years ago

@gene1wood and @jrgm - How does this proposal sound?

The comments in the original description, plus a new configuration mentioned in Option 4.

gene1wood commented 11 years ago

I also prefer option 4.

This may be out of scope for this ticket but instead of :

change config file on disk
restart node services causing potential bad user experience during the restart; or do an orchestrated draining of traffic node by node, in order to restart services while no traffic is on the node

it would be much nicer to :

change config file on disk
node app, every 30 seconds (or some number), reads the ondisk config and if there have been any changes, brings those changes live. This could be done by checking a timer countdown with every inbound request and when you've exceeded your 30 seconds (or whatever), check config and clear the timer

Or alternatively :

change config file on disk
the administrator sends some signal to the running node app to reload config without stopping the service

ozten commented 11 years ago

restart node services causing potential bad user experience during the restart; or do an orchestrated draining of traffic node by node, in order to restart services while no traffic is on the node

draining the traffic makes sense.

Depending on severity of the reason we're disabling bigtent, it may not be a big deal to restart the services to disable the service.

node app, every 30 seconds (or some number), reads the ondisk config and if there have been any changes,

This sounds complicated.

We already have something like this in the /.well-known/browserid TTL setting (mentioned in comment 1).

Ideally, you'd want this 30 second timer would have to be coordinated across Yahoo bigtent instances which is even more complicated / error prone.

the administrator sends some signal to the running node app to reload config without stopping the service

bigtent startup is very fast. Although existing connections would be broken, you effectively get this with deamontools and a killing the process, no?

callahad commented 11 years ago

It should be simple enough to catch SIGUSR1 or something like that to trigger a re-read of a single config value...

gene1wood commented 11 years ago

Although existing connections would be broken, you effectively get this with deamontools and a killing the process, no?

I hadn't thought so but I don't know. I'm unsure of what happens to users in the middle of an exchange and users that connect in the window between when the service stops and when daemontools starts it again. I would image the user would get 500 bad gateway from nginx during that period of time.

It should be simple enough to catch SIGUSR1 or something like that to trigger a re-read of a single config value...

Ya, that's the kind of thing I was thinking. Essentially I'm thinking of the "reload" capability that exists in most daemons (named, httpd, squid, etc)

ozten commented 11 years ago

My concern with "dynamic config" are:

1) It is easy to shoot yourself in the foot 2) We're optimizing for a feature that will only be used once or twice

If this is really important, I'll add it, but I'd rather use our existing code deployment technique for changing config values, where we've already solved these problems (such as draining traffic, etc).

gene1wood commented 11 years ago

Oh ya I meant this as something general to persona and bigtent, not specific to this scenario of rolling back a yahoo change.

For config changes it would be sweet if persona or bigtent had a config reload function like many apps have.

jrgm commented 11 years ago

So, yeah option 4 with a simple config change, drain, restart.

(Off topic, but a re-read-your-configs signal could someday-but-not-now be a useful thing, as it's much easier than the orchestrated drain. But yeah, dynamic config can be a foot gun, although there are use cases for it in some environments). /cc @jbonacci

lloyd commented 11 years ago

I like option 4.

lloyd commented 11 years ago

Off Topic

Configuration reload is interesting, we could build a .reload() function into convict /cc @zaach

But like everyone else, I'm unsure we should actually have dynamic config. The argument for is quicker configuration change so that we can, i.e. disable bigtent more quickly and reliably. The argument against is that writing code that expects config to change is hard, and testing it is harder. It would be a shame to put our software in a subtly broken state and affect users because a cached version of a config param doesn't match the dynamic version...

So how about the other way. What operational changes could we make to have config change / restart be better?

ozten commented 11 years ago

Patch is ready for deployment

gene1wood commented 11 years ago

@lloyd : ya I'm not for dynamic config, just a reload option like apache squid bind etc have that allows me to force the app to reload it's config without having to stop accepting connections or restart the app

jrgm commented 11 years ago

See https://bugzilla.mozilla.org/show_bug.cgi?id=850473#c3

Gonna wait on getting https://github.com/mozilla/browserid-bigtent/issues/142 all settled.

mozilla / persona-yahoo-bridge

Implement disable procedure for BigTent proxies #130