This is a repo which demonstrates how to capture tech debt items and evolve them over time, using the Tech Debt Triage framework, created by Oliver Tomlinson. The Framework exists entirely in the Read Me, and examples can be found in the Issues section of the repo.
2
stars
0
forks
source link
[EXAMPLE TECH DEBT ITEM] flapping alerts are not escalated after 2 hours #3
[x] Products & Platforms affected have been identified - labels applied (chat, platform etc)
[ ] Impacted characteristics have been identified - labels applied (security, availability,CI/CD etc)
[ ] Description is completed
[ ] Recommend actions have been defined
[ ] 'Payback' trigger points have been identified
Description
We have a flapping alert which quickly fixes itself, however if the alert continues to flap for over 2 hours, this should probably be investigated by an on-call team as it might be symptomatic of a different issue that we are not aware of (and therefor are not responding too)
What is it?
Make sure to explain the significance of the issuei.e "its going to damage our brand reputation", "its going to cost a lot of money"
Probability of being impacted
i.e. "its causing problems already", "it might cause problems in the future"
Negatively impacted characteristics
i.e if there is a cost impact, explain what the cost is and why it is justified as negative
What specific/domains resources, if any, are affected
Tech Debt acceptance criteria
Description
We have a flapping alert which quickly fixes itself, however if the alert continues to flap for over 2 hours, this should probably be investigated by an on-call team as it might be symptomatic of a different issue that we are not aware of (and therefor are not responding too)
What is it?
Make sure to explain the significance of the issue i.e "its going to damage our brand reputation", "its going to cost a lot of money"
Probability of being impacted
i.e. "its causing problems already", "it might cause problems in the future"
Negatively impacted characteristics
i.e if there is a cost impact, explain what the cost is and why it is justified as negative
What specific/domains resources, if any, are affected
i.e "EdgeKit Package", "Azure Functions", "Chat Cosmos"
Expected vs actual behavior
Only relevant if this is a bug
Repro steps
Only relevant if this is a bug
Framework / Runtime versions which may be relevant
i.e. "Azure Functions 3" , "Vue.js 1.1"
Recommended Action
What is the remediation
i.e "upgrade the nuget package" / "rearchitect this component"
Urgency of remediation
i.e "The longer we leave it the worst it gets", "the urgency is static"
Effort estimate of remediations
i.e. "additional Tax", "Story", "Sprint", "Project", "a programme of work"
External dependencies requiring co-ordination
i.e "a person outside of the team", "an upstream piece of work"
Payback trigger points
i.e "SLA drops below a threshold", "Customer complains", "Cost risen above the allocated budget"