Manual VS Automated Testing

AlexDawsonUK commented 1 year ago

Linked to https://github.com/w3c/sustyweb/issues/11, Provide text indicators (within the WSG's) for which guidelines will be able to be tested automatically using a testing tool, and also which guidelines will require manual testing and are outside of the scope or capability of automation.

Credit: @thibaudcolas

marvil07-adapt commented 1 year ago

First, thanks for the WSG draft, it is shaping as a great overall guide for sustainability on the web!

It would be great to have tooling around some of the guidelines, and it seems like the first step may be to identify what is possible to be automated.

Following, a first pass trying to identify if automated testing can be used to indicate compliance for each of the WSGs. Naturally, once there are metrics associated, as expected over #11, there may be changes to make.

A yes is indicated when it is clear that some metrics may be available to automate the evaluation of the success criteria. A no is indicated when no automated testing may be applicable. A partial is indicated when only a subset of the success criteria could be evaluated in an automated way.

The hints column tries to use some tags that can help provide extra context around the automated testing value used.

automated testing	WSG	hints
no	2.1 Undertake Systemic Impacts Mapping	internal
no	2.2 Assess And Research Visitor Needs	internal
no	2.3 Research Non-visitor's Needs	internal
no	2.4 Consider Sustainability In Early Ideation	internal
no	2.5 Account For Stakeholder Issues	internal
partial	2.6 Create a Frictionless Lightweight Experience By Default
partial	2.7 Avoid Unnecessary Or An Overabundance Of Assets	editorial
partial	2.8 Ensure Navigation And Way-finding Is Well-structured
partial	2.9 Respect The Visitor's Attention
no	2.10 Use Recognized Design Patterns
partial	2.11 Avoid Manipulative Patterns
no	2.12 Document And Share Project Outputs	internal
no	2.13 Use A Design System To Prioritize Interface Consistency	internal
partial	2.14 Write With Purpose, In An Accessible, Easy To Understand Format
partial	2.15 Take a More Sustainable Approach To Image Assets	editorial
partial	2.16 Take a More Sustainable Approach To Media Assets	editorial
no	2.17 Take a More Sustainable Approach To Animation	editorial
yes	2.18 Take a More Sustainable Approach To Typefaces
partial	2.19 Provide Suitable Alternatives To Web Assets
no	2.20 Provide Accessible, Usable, Minimal Web Forms
partial	2.21 Support Non-Graphic Ways To Interact With Content
no	2.22 Give Useful Notifications To Improve The Visitor's Journey
partial	2.23 Reduce The Impact Of Downloadable Or Physical Documents
no	2.24 Create A Stakeholder-focused Testing & Prototyping Policy	internal
no	2.25 Conduct Regular Audits, Regression, And Non-regression Tests	internal
no	2.26 Analyze The Performance Of The Visitor Journey	internal
no	2.27 Incorporate Value Testing Into Each Major Release-cycle	internal
no	2.28 Incorporate Usability Testing Into Each Minor Release-cycle	internal
partial	2.29 Incorporate Compatibility Testing Into Each Release-cycle
no	3.1 Identify Relevant Technical Indicators	editorial
yes	3.2 Minify Your HTML, CSS, And JavaScript
partial	3.3 Use Code-splitting Within Projects	internal
no	3.4 Apply Tree Shaking To Code	internal
partial	3.5 Ensure Your Solutions Are Accessible
no	3.6 Avoid Code Duplication	internal
no	3.7 Rigorously Assess Third-party Services
partial	3.8 Use HTML Elements Correctly
yes	3.9 Resolve Render Blocking Content	editorial
partial	3.10 Provide Code-based Way-finding Mechanisms	semantics
partial	3.11 Validate Form Errors And External Input
partial	3.12 Use Metadata Correctly	editorial semantics
partial	3.13 Adapt to User Preferences	editorial
partial	3.14 Develop A Mobile-first Layout
no	3.15 Use Beneficial JavaScript And Its API's
partial	3.16 Ensure Your Scripts Are Secure	internal
no	3.17 Manage Dependencies Appropriately	internal
yes	3.18 Include Files That Are Automatically Expected
yes	3.19 Use Plaintext Formats When Appropriate
no	3.20 Avoid Using Deprecated Or Proprietary Code	internal
no	3.21 Align Technical Requirements With Sustainability Goals
yes	3.22 Use The Latest Stable Language Version	internal
no	3.23 Take Advantage Of Native Features	internal
no	3.24 Run Fewer, Simpler Queries As Possible	internal
partial	4.1 Choose A Sustainable Hosting Provider
partial	4.2 Optimize Browser Caching	editorial
yes	4.3 Compress Your Files
no	4.4 Use Error Pages And Redirects Carefully	editorial
no	4.5 Limit Usage Of Additional Environments	internal
no	4.6 Automate To Fit The Needs	internal
no	4.7 Maintain a Relevant Refresh Frequency	editorial
no	4.8 Be Mindful Of Duplicate Data.	internal editorial
no	4.9 Enable Asynchronous Processing And Communication	editorial
yes	4.10 Use Edge Computing
no	4.11 Use The Lowest Infrastructure Tier Meeting Business Requirements	internal
no	4.12 Store Data According To Visitor Needs	internal editorial
no	5.1 Have An Ethical And Sustainability Product Strategy	internal
no	5.2 Assign A Sustainability Representative	internal
no	5.3 Raise Awareness And Inform	internal internal
no	5.4 Communicate The Ecological Impact Of User Choices	internal
partial	5.5 Estimate A Product Or Service's Environmental Impact	internal
no	5.6 Define Clear Organizational Sustainability Goals And Metrics	internal
no	5.7 Verify Your Efforts Using Established Third-party Business Certifications	internal
no	5.8 Implement Sustainability Onboarding Guidelines	internal
no	5.9 Support Mandatory Disclosures And Reporting	internal
no	5.10 Create One Or More Impact Business Models	internal
no	5.11 Follow A Product Management And Maintenance Strategy	internal
no	5.12 Implement Continuous Improvement Procedures	internal
no	5.13 Document Future Updates And Evolutions	internal
no	5.14 Establish If A Digital Product Or Service Is Necessary	internal
no	5.15 Determine The Functional Unit	internal
no	5.16 Create A Supplier Standards Of Practice	internal
no	5.17 Share Economic Benefits	internal
no	5.18 Share Decision-making Power With Appropriate Stakeholders	internal
no	5.19 Use Justice, Equity, Diversity, Inclusion (JEDI) Practices	internal
no	5.20 Promote Responsible Data Practices	internal
no	5.21 Implement Appropriate Data Management Procedures	internal
no	5.22 Promote Responsible Emerging Technology Practices	internal
no	5.23 Include Responsible Financial Policies	internal
no	5.24 Include Organizational Philanthropy Policies	internal
no	5.25 Plan For A Digital Product Or Service's Care And End-Of-Life	internal
no	5.26 Include E-waste, Right-to-repair, And Recycling Policies	internal
partial	5.27 Define Performance And Environmental Budgets	internal
no	5.28 Use Open Source Tools	internal

Hints meaning on the table follows.

editorial: Editorial decisions may be making trade-offs for the website in specific.
internal: It is based on internal information that is not part of the actual website.
semantics: semantic meaning specific to the content is needed to fully check the item.

AlexDawsonUK commented 1 year ago

Thanks @marvil07-adapt for making a first attempt to identify which guidelines will be able to be tested against. We will definitely, be adapting this into the version which ends up in the specification.

As you mentioned, tooling support would be incredibly useful and it's on our roadmap for inclusion. Aside from this issue (labelling), and #11 (which you mentioned, covering a test suite for implementations & techniques), we will also be producing auditing guidance (#28) and implementation guidance for tooling & user-agents (#22). Hopefully this will arrive as soon as we can but it may take a draft or two as we're attempting to connect it all together (for cohesiveness).

thibaudcolas commented 1 year ago

I’ve done a very similar assessment :) I get to go second so I thought I’d share my assessment, and also a comparison table. I chose to do this at the level of Success Criteria, which makes for a very long table, so I put the tables in a gist:

My classification

Here is how I personally rated the SCs (and how many SCs I found for each rating):

Static analysis (6): Potential to write automated code checks that would run in CI / developer IDEs. Example: jsx-a11y/alt-text
Automated (35): "Runtime" analysis – potential to inspect the product with automated browsing or equivalent and detect issues. Example: Axe
Manual, quantitative (16): Likely manual auditing but with potential to follow a set scoring algorithm. Possible to create semi-automated tools to help with auditing. Example: Tab stops testing
Manual, qualitative (50): Manual auditing with an element of interpretation. Can be done with publicly available information, reproducibility of findings a possible concern. Example: WCAG SC 3.2.4 Consistent Identification
Consulting (125): Manual auditing requiring interpretation and internal knowledge of the project / organisation. Cannot be audited without behind-the-scenes access.

Comparison with @marvil07-adapt

After converting this classification to the one by @marvil07-adapt, here is how we differ in the comparison table:

Same assessment for 69 guidelines
Almost the same for 21 ("partial" vs. yes or no)
Different for 3 ("yes" vs "no")

Mapping from my classification to yes/no/partial (internal). I also used "partial" if a guideline was a mixture of "yes", "no", or "partial".

Static analysis → yes
Automated → yes
Manual, quantitative → partial
Manual, qualitative → no
Consulting → no – internal

Differences

@marvil07-adapt	@thibaudcolas	Guideline
no – internal	yes	3.4 Apply Tree Shaking To Code
no – internal	yes	3.17 Manage Dependencies Appropriately
no – internal	yes	3.20 Avoid Using Deprecated Or Proprietary Code

Here are my thoughts on the three guidelines where we differ:

For 3.4 Apply Tree Shaking To Code – I think this is automate-able because though the guideline is called "tree shaking", the SC is only about eliminating dead code in a wider sense. We can’t quantify whether the project makes an effort to identify and eliminate dead code, but we can definitely quantify the amount of dead code, which seems good enough to me to consider this automate-able.
For 3.17 Manage Dependencies Appropriately – this all seems automate-able enough to me. Which dependencies are included can be checked with different methods, and same for how actively they are used.
For 3.20 Avoid Using Deprecated Or Proprietary Code – I considered this automate-able at least for code sent client-side (HTML/CSS/JS/SVG/etc).

Partial matches

Note when I reviewed the potential for automation, my focus was primarily on auditing a website or app with no internal knowledge. So for most SCs that require internal knowledge I rated them as "consulting" / "no - internal", even if there could be automation. There are a few exceptions, such as 3.22 Use The Latest Stable Language Version.

I didn’t assess our differences here in much detail. At a high level:

For 2. User-Experience Design, I seem to think there’s much less room for automation
For other areas, it’s mixture of results with no clear pattern

@marvil07-adapt	@thibaudcolas	Guideline
partial – editorial	no	2.7 Avoid Unnecessary Or An Overabundance Of Assets
partial	no	2.8 Ensure Navigation And Way-finding Is Well-structured
partial	no – internal	2.9 Respect The Visitor's Attention
partial	no	2.11 Avoid Manipulative Patterns
partial	no	2.21 Support Non-Graphic Ways To Interact With Content
partial – internal	yes	3.3 Use Code-splitting Within Projects
partial	yes	3.8 Use HTML Elements Correctly
yes – editorial	partial	3.9 Resolve Render Blocking Content
partial – editorial, semantics	yes	3.12 Use Metadata Correctly
partial – editorial	yes	3.13 Adapt to User Preferences
partial	no	3.14 Develop A Mobile-first Layout
partial – internal	yes	3.16 Ensure Your Scripts Are Secure
partial – editorial	yes	4.2 Optimize Browser Caching
no – editorial	partial	4.4 Use Error Pages And Redirects Carefully
no – editorial	partial	4.7 Maintain a Relevant Refresh Frequency
no – editorial	partial	4.9 Enable Asynchronous Processing And Communication
yes	partial	4.10 Use Edge Computing
partial – internal	no – internal	5.5 Estimate A Product Or Service's Environmental Impact
no – internal	partial	5.19 Use Justice, Equity, Diversity, Inclusion (JEDI) Practices
no – internal	partial	5.22 Promote Responsible Emerging Technology Practices
partial – internal	no – internal	5.27 Define Performance And Environmental Budgets

AlexDawsonUK commented 1 year ago

This is great stuff, thanks for putting in all the hard work! It will certainly help guide us alongside the other testability criteria we are producing to help make the specification more robust.

airbr commented 1 year ago

Just a comment to say thanks for all the good work @thibaudcolas - it immediately brought forward an important question to me generally: "Which of these guidelines need special access to work on" or as you describe it as needing internal knowledge. It is a good contextual question- is this for people with special access or internal knowledge or anyone potentially?

This is an important consideration as the guidelines progress into more use cases and adoption. Thanks for bringing it to the forefront.

AlexDawsonUK commented 10 months ago

Once STAG reaches a settled state (and the testability component is verifiable), this issue will progress with #11.

mgifford commented 9 months ago

Just updating link from STAG to STAR.

AlexDawsonUK commented 8 months ago

I've taken the above information and fed it into a spreadsheet, also utilizing the data from EcoGrader (who are also seeking to machine-test the WSGs into their product suite), and finally used my own interpretation of the spec (which I'll be feeding into STAR Techniques and the upcoming test suite) to come up with some potential ways of testing for compliance. Rationale is provided for why a SC cannot be tested, and if testing can be done, an example is given for criteria purposes.

Note: Because the testability is linked to Techniques, I've made assumptions (for the purposes of simplicity) that internal access will be available (if its possible in certain cases then an assumption can be made that testability can be true in certain cases making it at least a partial pass), but as it's been mentioned this will not always be the case, and where such access is required, it will be noted within the relevant tests and can be marked as such (with a caution note).

Source: Testability. Feedback is useful! As is further conversation on this topic. There will be more than one way of interpreting a Success Criteria as machine testable (as there is with WCAG Techniques), so consider this a starting point.

AlexDawsonUK commented 7 months ago

Its taken a LOT of work, but we now have Manual VS Automated testing available within the specification. It currently exists within the living draft but will be published within this months scheduled release.

The content for testability as noted above was derived from multiple sources including contributors here and my spreadsheet, and its all been compiled into a set of machine testable techniques (now all available in STAR).

With this in mind, the techniques that could be built, as in testable, have been cross-linked into the main specification and are used as evidence of a possible way it could be approached by toolmakers, etc. As this task is now considered complete (though more cases can be added to enhance STAR in the future) I'll close the issue as a completed feature.

Note: In this example, you can see a mixture of testable and non-testable criteria (Success Criteria indicate such a state). Where criteria can be tested, links to STAR techniques are provided as citations.

w3c / sustyweb