Open andrewjbtw opened 4 years ago
I think the second step can be accomplished by removing this clause: https://github.com/sul-dlss/was_robot_suite/blob/master/robots/was_dissemination/start_special_dissemination.rb#L18
And the one step in that workflow: https://github.com/sul-dlss/was_robot_suite/blob/master/robots/was_seed_dissemination/update_thumbnail_generator.rb
the plan is for the first responder to shut down the server next week to see what breaks. we'd prefer to just make it invisible to other services at first, without deleting the VM, in case we need to turn it back on. but the plan is to make sure accessioning continues to work after the was-thumbnail VM is made unreachable for other services.
@andrewjbtw do you have a sense of what testing needs to take place for this? I wonder if we should do this during the Thursday 9am-12pm window when @tallenaz will be helping us test hydrus/lyberservices entanglement.
For the workflows, I think we just need to register and/or update a web archive seed object. It might be a good idea to accession a crawl object too since we're looking for hidden connections, but as far as I can tell the crawl objects aren't directly implicated in the thumbnails, just the seeds.
Other than that, I think we just need to check if there are any unexpected alerts or errors from SWAP. The thumbnail server checks SWAP to generate the thumbnails but I don't know if SWAP ever checks for the thumbnail server.
@andrewjbtw OK, how do you feel about doing this during the Thursday morning window? If you're open to it, I will reach out to Ops. (I ordinarily wouldn't suggest doing two of these tests at the same time, but I think Hydrus and web-archiving are pretty independent of one another...)
that works for me
Cc: @tallenaz, if you're open to it, here's another decommission test we can run on Thursday morning.
OK. To clarify, what's the second test? Is it "disassociate was-thumbnail from IP" the checklist above. If so, does "shut down was-thumbnail" count as an interpretation? Looks like there's an explicit "disassociate IP" function in Azure but no straightforward analogue in VMWare.
@tallenaz I think it'd be fine to shut down the VM, yes, but let me tag prior folks to see if they have any objections: @jcoyne @jmartin-sul @andrewjbtw
@tallenaz I think shutting it down should be fine.
@jcoyne @jmartin-sul
I worked with @tallenaz and @andrewjbtw to decommission was-thumbnail-stage and -prod, and we can confirm that all the tests identified above worked just fine. No weird new behavior, and no unexpected exceptions in honeybadger. We will leave the thumbnail service boxes off but leave them around for three weeks, and then re-assess. That way if they are needed in a pinch, we can turn them back on.
@andrewjbtw will proceed with cleaning up stacks soon.
@tallenaz will check back on this in October and if none of us have lingering concerns, he will decommission the VMs for good. At that point, this issue can be closed.
(This is no longer being actively tracked by the @sul-dlss/infrastructure-team FR.)
AFAICT, the remaining was-thumbnail-service tendrils are in documentation and config:
AFAICT, the remaining was-thumbnail-service tendrils are in documentation and config: ...
- https://github.com/sul-dlss/was-thumbnail-service (deprecate/archive github repo?)
yeah, i'd be in favor of deprecating and archiving once we're sure the VM can be retired
note that in the unlikely event we have to resurrect this VM, we should revert these sdr-deploy and access-update-scripts changes:
https://github.com/sul-dlss/sdr-deploy/pull/36 https://github.com/sul-dlss/access-update-scripts/pull/122
Decommissioning was-thumbnail was started in a proxy ticket in Argo https://github.com/sul-dlss/argo/issues/2091
Some of this work has been done already. Using this ticket to make a checklist of what's needed to finish the decommissioning process: