Draft talking points for ATARC

pburkholder commented 6 years ago

In 1998 I was living and working in Johannesburg, South Africa, where I was managing field operations for joint seismology experiment between MIT, Carnegie Institute in the U.S., and the University of Zimbabwe, University of Botswana, University of Cape Town, and University of the Witwatersrand. When I wasn't driving our pickup truck to service seismic stations across northern South Africa and into Zimbabwe, or processing, archiving and shipping those data, I was working more and more with the undergraduate and graduate students in the Geophysics program at the University of Witwatersrand, where I was based.

This was four years into the world of the new South African republic, and among our students were blacks from disadvantaged backgrounds who were struggling to do class and research work when they may have no exposure to computers before, and the computers they did have were underpowered Windows 3.1 systems. The Sun systems we did have were the three provided by U.S., two of which were dedicated to our seismology experiment.

So a faculty member and I contacted the various diamond and gold mining interests for their under-utilized systems, and came away with a dozen Sun and Irix systems in various states of repair. From them I was able to build eight working systems with one router connected to the University's networking, and we had South Africa's for working Internet-connected Geophysics data lab with Internet access.

From their, it was a matter of holding a series of "Geophysis research with Unix and free tools" workshops, to and our students were on their way to making real progress on their degrees and in their research. They were delighted

But the systems meant nothing until folks could use them, so I held a series of workshops introducing student to Unix and freely available geophysical processin g and mapping tools. They were delighted. And it was transformative, since these students were not able to make real progress in their research and toward their degrees.

And I had my first experience not just in delighting my users, but also in making a real contribution to democracy. For in South Africa, democracy hinged on the ability to sustain economic transformation, and giving black students the tools to move forward in their new democracy was a substantive contribution.

That experience in delighting users by making work easier, or even possible, transformed my career goals, from one focussed on research, to one focussed on making technology work for my colleagues, or for my community. And I think many of us here have

pburkholder commented 6 years ago

had similar experiences that have led us to master technology for the greater common good.

For me, that means, For much of the last decade I've worked within the DevOps movement to make the process of building, releasing, and running software more delightful and less dreary.

Core to my notion of DevOps is placing a premium on flow: continually deliverying units of value on a schedule dictated only by consideration of business value or mission realization. This flow minimizes the costs imposed by delay, small changes to a system provide greater long-term stability than infrequent large changes, it minimizes the mean-time-to-recovery, and it enables responsiveness to business/mission changes .

I still call it DevOps because originally, is small, flat organizations, the value flow was from Dev -> Ops. But whether you call it DevOps, or DevSecOps, or ProductDevSecOGCOps, the point is that there's a value chain applied to the flow of work from conception to realization. Flow is only realized when each link in the chain actually adds value (otherwise there's needless delay), and there's capacity to minimize delay at each link in the chain.

pburkholder commented 6 years ago

Regardless, authority and responsibility for changes devolve on to the team that is answerable to their customers, their stakeholders. That's the only way to align incentives for the success of your product.

If security is incentivized to avoid risk, If ops is incentivized to to avoid downtime If OGC is incentivived to fear the IG, If all the entities on the "value chain" are not aligned to deliver value, then they will inevitably add friction and delay to protect their own incentive.

The result of misaligned incentives will be the forms the DevOps with non of the value. You will have cloud, IaaS, infrastructure as code, microservices, and continuous integration but few of the benefits.

DevOps arose from practitioners that had the authority and responsibility to use those tools to release changes as business need demanded, and they bore the impact of any downsides: getting paged, conducting post-mortems, and solving production issues.

When you keep those tools but keep them at a remove from the product teams, when they are not enabled to use them, you have broken DevOps, as you have broken flow.

A few examples

pburkholder commented 6 years ago

Anti-pattern 1: The DevOps team gatekeeper: This usually starts when an org adopts Chef, or Ansible, so the operations team can better serve rapid delivery of code, but then they keep their role as the gatekeeper/bottleneck, either by withholding actual access to the code, or refusing to train their fellows and blaming the Devs for not being savvy enough to adopt their IaC system. I saw this happen at a game developer where eventually the devs circumvented the ops release process, and much chaos ensued (included for the customers of the gaming platform).

Anti-pattern 2: The two-pass (or three-pass) infra-as-code system. This is a manifestation of Conway's Law, that a system design will mimic the designing organization's communication structure. This usually starts when an org adopts Chef, or Ansible, so developers can consistently configure their application infrastructure across their various environments. But then there's insufficient trust in their work to allow them full configuration access, so the tools are hobbled to run as an unprivileged user, or are tied into a multi-pass system, instead of actually communicating/collaborating across teams. I've seen that at a financial institution that prided itself on harboring internally competitive teams, so they reaped what they sowed.

There's a huge irony here in that you trust the developers to write the code that handles your data and your users, but not the code that runs the infrastructure, even though both sets should go through peer review, audit and static analysis.

Anti-pattern 3: The build-your-own-PaaS. I've seen this over and over again. In insurance companies, banks and the federal government. Either they don't understand the capabilities of mature PaaS systems, or think it's too much overhead to run because they have a simpler requirements, or they think they have unique requirements. Usually it's some mismash of Inf-as-code systems like Terraform and Chef + Git repositories + CI/CD systems that detect changes and build on change, usually with ticketing systems in the mix to provide the level of human review that is believed to be needed.

Ends up as Platform-as-a-Concierge service because the automation doesn't suffice.

pburkholder commented 6 years ago

NOT Delightful

You only have DevOps if you've enabled your product to release as often as the mission demands. The product team must have the resources available for their release cadence.

Corollary: The team must have sufficient tool choice to support that mission need. We believe there are big wins in being prescriptive/opinionated, in terms of system architecture, technologies, deployment strategy, etc. But one size does not fit all, so there should be a clear benefit to using the preferred solution, and costs to using others, so a clear cost/benefit analysis can be made.

pburkholder commented 6 years ago

Which brings me to my recommendations for practicing DevOps within the federal government, based on my experiences, and those of my colleagues.

An empowered advocate. You need your CIO on board, and if not also your CISO, then a CIO who's willing to evaluate risk independent of their CISO. You will likely need to hack your own bureacracy in terms of change requests, process and oversight, and you will need advocacy, not just tacit acceptance.
Know your mission. Choose your outcomes and metrics to support that mission. Research DevOps metrics (availability should one of many, and not the most important). Mission matters, since tools won't fix your broken culture. Tools can driver behavior that support a new culture, but only to aligning along a mission, a purpose, a set of principles can you motivate actions aligned with that missions.

pburkholder commented 6 years ago

Work in the open. w/in your agency group if not exactly OSS. Not just sprint reviews but postmortems as well. It's important for the development of safety culture. You can't model learning if you don't model ignorance. You can't model experimentation if you don't model failure.
Minimize your ATO surface. The less you do the faster and more secure you can do it. This can be a hard cognitive change. It took me for ever to think in services instead of servers, compute instead of computers. Then build in your compliance in the work you do on top of your P-ATO'd platform.
Start small. Empowered teams are critical. You can't empower everyone all at once, you can't jump across the chasm with everyone on your back. Start small with people who believe in change, and are empowered to effect change.

pburkholder / pburkholder.github.io

Draft talking points for ATARC #2