Formalize "recovery testing" definition(s)

samm82 commented 9 months ago

From #39, recovery testing really only makes sense as a semi-subcategory of performance testing, and the distinctions between its different definitions aren't really meaningful. This should be made explicit, probably as part of the "refinement" of my glossary

samm82 commented 8 months ago

Following up on this was one of my final to-do list items from going through IEEE 2017 (it listed "recovery" and "recoverability" separately, which got me thinking), and this is where I've landed:

The quality of "recoverability" is related to "survivability" (IEEE, 2017, p. 450), is a subset of the quality of "fault tolerance" (Washizaki, 2024, p. 4-11) and is implied to have an associated type of testing (Washizaki, 2024, p. 5-9; IEEE, 2017, p. 225)
This implied "recoverability testing" is a subset of reliability testing (IEEE, 2017, p. 375) and usability testing (Washizaki, 2024, p. 5-10), is related to failover testing (Washizaki, 2024, p. 5-9), is functionally equivalent to Disaster/Recovery (or "Disaster Recovery" (IEEE, 2013, p. 20)) testing (IEEE, 2022, p. 22; 2017, p. 140), and overlaps with "recovery testing" (IEEE, 2017, p. 370)
There then seems to be a separate idea of backup and recovery testing that is a type of performance-related testing (IEEE, 2022, p. 22) and also usability testing? (Washizaki, 2024, p. 5-10)

In #39, we decided that recovery only makes sense in the context of performance testing, since there is not an unlimited amount of time to recover, but I know disagree with that for a few reasons:

Wouldn't that make every type of testing a subtype of performance testing? For example, a function doesn't have unlimited time to execute, so would unit testing be a subtype of performance testing?
This more fleshed out idea of "recoverability testing" seems to focus on the scope of the recovery; how much was able to be recovered? which required functions are restored? I think this makes sense, and also justifies the two separate types, since there is a tradeoff between the extent of a recovery and the performance of a recovery, and these should be tested differently.
In the summary of this research I've added (Recovery Notes), the vast majority of information is independent of the idea of "performance", which makes me think that explicitly including it is meaningful.

Perhaps this focus on performance should be made more explicit, perhaps by calling it "recovery performance testing" or "recovery efficiency testing" (that would, of course, be proposed in my "improved" glossary).

My thoughts now culminate in these two definitions:

Recoverability Testing: "How well a system or software can recover data during an interruption or failure" (Washizaki, 2024, p. 7-10; similar in IEEE, 2017, p. 369; OG ISO/IEC, 2011) and "re‐establish the desired state of the system" (IEEE, 2017, p. 369; OG ISO/IEC, 2011), one "in which it can perform required functions" (IEEE, 2017, p. 370), and
Recovery Performance Testing: "Testing that measures the degree to which system state can be restored from backup within specified parameters of time, cost, completeness, and accuracy in the event of failure" (IEEE, 2013, p. 2),

although I'll be tracking the overlapping definitions above in this current iteration of the glossary.

samm82 commented 7 months ago

From #47: it's important to note that there is an aspect to recovery that is out of scope; if a maintenance worker needs to intervene to restore the system, then it will be out of scope. (There's almost the suggestion of "software-intensive system testing" that would consider this type of testing as well that may be of interest to others, such as Amazon if their Web Services go down.)

samm82 commented 4 months ago

Inspired by #66 (and my original plans to do so anyways 😅), I took a swing at refining this group of definitions. Not sure what the best way to ask for a review/feedback would be. They can be found on pp. 28-30 in my notes document; let me know if there's a better format for these or if there are any questions/concerns! @smiths @JacquesCarette

If we decide these changes are good, I'm assuming they should be done in a separate, "analyzed" spreadsheet? I haven't started that yet to reduce the traceability nightmare, but at least having the information captured in this document will be helpful for when I create that! 😁

JacquesCarette commented 4 months ago

Best way to review is to create an issue on here with the details also here. Clicking through too many things makes reviewing harder.

samm82 / TestGen-Thesis

Formalize "recovery testing" definition(s) #40