w3c / aria-at-app

Test runner and results reporter for ARIA-AT
http://aria-at.w3.org/
Other
35 stars 15 forks source link

Questions on proposed database changes to better support #518 #632

Open howard-e opened 1 year ago

howard-e commented 1 year ago

There appears to be a need to augment the database's structure slightly, to better support what's being discussed in #518. Particularly around the proposed Test (now Data) Management page and its related features, once approved.

2 pages have been drafted to describe the current state of the database and the proposed changes:

I'd like to particularly raise focus on some questions posed under the first point of the Notes and Questions section of the proposal doc as they could assist with the final design proposed.


Edit on Jul 11, 2023: Added updated proposal doc.

howard-e commented 1 year ago

cc @mcking65

howard-e commented 1 year ago

What happens when a new Test Plan Version is added and should be considered in place of an existing Test Plan Report attached to an older Test Plan Version, that's already in the CANDIDATE phase?

This question from the notes and questions section is also being discussed in #634 and https://github.com/w3c/aria-at/issues/950.

mcking65 commented 1 year ago

@howard-e

Thank you again for the excellent data model documentation. The formatting you did withheadings, tables , and lists makes it easy to read with a screen reader.

I've now spent enough time with it to make some recommendations for better aligning the data model with the working mode, but first I will try to address the questions.

  1. Note that the most practical place for the phase column to be would be on the TestPlanGroup table, but it is located on TestPlanReport. 

Note that a specific version of a test plan is in a phase of the working mode, not the test plan itself. Phase should be on testPlanVersion. That would reflect the working mode process. A test plan can have a version in each phase of the working mode. V1 could be recommended while V2 is in candidate review, V3 is in draft review, and V4 is in R&D Complete.

The following questions being discussed by the CG raises concerns on where it should be placed: What happens when a new Test Plan Report is introduced to the Test Queue, while a Test Plan is already under review in the CANDIDATE phase, but there's a need for it to be included in that Test Plan?

Note that a report is not "introduced to the test queue." A test plan run is added to the test queue. A test plan run is for a specific version of a test plan, an AT, and a browser. The test plan run generates a report. That report is associated with the test plan version, not the test plan.

It is possible for the test queue to have runs for two different versions of the same test plan in it. Such a scenario could be:

What happens when a new Test Plan Version is added and should be considered in place of an existing Test Plan Report attached to an older Test Plan Version, that's already in the CANDIDATE phase?

When a new test plan version is merged to master in aria-at, it's working mode phase is R&D complete. It does not have any reports.

Let's consider a specific scenario in detail.

The bottom line is that reports are tied to a specific version of a test plan.

What happens when a new AT version is added and should be considered for the results tied to a Test Plan that's already under review in the CANDIDATE phase?

Let's say V2 of test radio plan is in candidate review and if has a report for JAWS/Chrome generated using JAWS V2023.4. Now, let's say V2023.7 of JAWS is released and Vispero wants to see the report updated.

  1. Admin will add JAWS V2023.7 to the system.
  2. Admin will add a test plan run to the queue for radio test plan V2 with JAWS 2023.7 and Chrome.
  3. Two testers will complete the run, admin will resolve conflicts, and mark the report complete.
  4. Since Radio test plan V2 is already in candidate phase with published reports, the system will publish the JAWS V2023.7/Chrome report. In the current UI, the report for JAWS V2023.4 would no longer be available from the reports page. (Note: it could be accessible from both the reports dialog and the page (still to be designed) that shows all data for a test plan.

What happens when a Test Plan should be drafted, after being in a RECOMMENDED phase? Is this "refreshed" or is this a new instance, and so, the older one can be "archived" ?

If V1 of a plan is recommended, when V2 is advanced from candidate to recommended, V1 is sunset. It will still be in the system and accessible from the page that shows all data for a test plan.

What should happen when new versions are expected to be included in a Test Plan that's already under review past the DRAFT phase?

What should happen when a new Test Plan, with an existing and RECOMMENDED version of that Test Plan, is newly drafted?

I think this is answered above.

If the phase should be on TestPlanGroup, this means that if a new TestPlanReport is added for a Test Plan currently in CANDIDATE , this new Test Plan Report would be immediately considered as also being CANDIDATE. This would be an undercutting of the review process and potentially harmful for the credibility of the results being displayed.

Correct, there is potential for that type of problem if phase is a property of the test plan, but it is not. It needs to be a property of a testPlanVersion.

Note, however, that reports do not have a phase. The stakeholders look at report to data when reviewing a plan, but they are not providing approval of the data, they provide approval of the plan. If a plan is approved, that means it can be used with any version of any browser and any AT to generate accurate data. The data that is generated by tomorrow's version of an AT may be very different from today's version of an AT. If the plan is good, the data is accurate. If the data from tomorrow's version of an AT contains more failures, by providing approval of the plan, an AT vendor has essentially stated that the new failures are AT bugs.

If the phase were moved to TestPlanGroup, and the user is forced to move the TestPlanGroup back to a DRAFT state if required when trying to add a newer version of a Test Plan Version, AT version or similar. Would this violate the WM?

Yes that would be a problem. A test plan version cannot move backward through the process. The test plan itself does not move through the process, a specific version of the test plan moves through the process.

Using an arbitrary reports array to keep track of the TestPlanReports would also allow for admins to slot in newer versions being tested, which replaces the older ones (or upgrade the older one where applicable)

The array should be associated with a version of the test plan, not the test plan.

In regards to the TestPlanGroup table, can we think of it as "complete" when the recommendedStatusReachedAt timestamp is no longer null?

No.

Based on previous discussions (though not on the critical path), there could be some benefit in tracking the actions happening throughout the system; this could help for reverting changes, accountability checks or just some general reporting use case. The granularity of what events could or should be tracked is another discussion.

Definitely agree that an events table could be very useful. We would probably need a separate table defining the types of events, e.g., eventTypes.

Renaming some of these entities from what they are, to more closely align with the language used in the WM currently could be quite beneficial when interacting with readers of the WM and other stakeholders, but it’s hard to judge the effort of that refactor. Especially when considering some items of the WM are still being discussed and defined, perhaps this isn't worth the effort?

Possibly. However, it seems like we have all needed entities. The primary issue is that some of the relationships are not aligned with the working mode. We probably need to correct those issues. Here are some thoughts.

Test Plan Group

It appears to me that the data in this table belong in the test plan version table. Moving the following data to the TestPlanVersion entity would eliminate the need for the TestPlanGroup entity.

Column Comment
candidateStatusReachedAt (timestamp) Only a specific version of a test plan can advance to candidate review; this timestamp should not be associated with the test plan entity but rather a testPlanVersion entity. I suggest the name should use the word "phase" instead of "status", i.e., CandidatePhaseStartedAt.
recommendedStatusTargetDate (timestamp) Only a specific version of a test plan can advance to recommended phase; this timestamp should not be associated with the test plan entity but rather a testPlanVersion entity. I suggest the name should use the word "phase" instead of "status", i.e., recommendedPhaseTargetDate.
recommendedStatusReachedAt (timestamp) Only a specific version of a test plan can advance to recommended phase; this timestamp should not be associated with the test plan entity but rather a testPlanVersion entity. I suggest the name should use the word "phase" instead of "status", i.e., RecommendedPhaseStartedAt.
reports (int array) Reports are generated using a specific version of a test plan and should be tied to that version.

TestPlanVersion

In addition to the data that is in TestPlanGroup,there are some other essential data elements missing:

  1. covered AT: Each version of a test plan covers a specific list of AT. Perhaps that is in the metadata json? If so, that is probably sufficient for now. In the future, we may have queries that have AT conditions in the where clause, so that data might need to be elivated to a column.
  2. DraftReviewStartDate
  3. SunsetDate
  4. CurrentPhase
  5. Vendor review status (vendors review specific versions of a test plan, not a report. The vendors have report data because it is an aid to reviewing the test plan.)

I believe this table should be updated to have the following columns.

id (int) - PK Auto incrementing identifier
testPlanId (int) - FK Foreign key identifier for the test plan (TestPlan)
gitSha (text) The git sha of the default branch when the information of the test(s) was pulled from w3c/aria-at
gitMessage (text) The git message of the default branch when the information of the test(s) was pulled from w3c/aria-at
testPageUrl (text) Generated url for the tests found in the w3c/aria-at/tests/*pattern*/reference folder. The application opens this link through a proxy when needing to view a test from the application (such as with the Open Test button found on the Test Run page)
hashedTests (text) Used to uniquely identify tests for a Test Plan Version. A huge benefit of this is not having to capture every test plan unnecessarily with no change, when a new "version" of the tests are imported. This would address #579 and is currently under review at #621
tests (json) An object containing information on all tests related to the test plan version for the referenced pattern
metadata (json) An object containing information with other miscellaneous data found in the references.csv
updatedAt (timestamp) **Consider ImportedAt** Time when the test plan version was imported
draftReviewStartedAt (timestamp) The time the test plan version was advanced to the draft review phase
candidateReviewStartedAt (timestamp) The time the test plan version was advanced to the candidate review phase
vendorReviewStatus (text) An enumerated field with values of Ready, In Progress, and Approved
recommendedPhaseTargetDate (timestamp) The target time for advancing the test plan version to the RECOMMENDED phase. This is a calculated time based on the candidateReviewStartDate and the 180 days described in the WM for the recommended time the Test Plan should remain in the CANDIDATE phase. This value can be manually updated
recommendedPhaseStartedAt (timestamp) The time the test plan version was advanced to the RECOMMENDED phase
sunsetAt (timestamp) The time the test plan version was sunset.
CurrentPhase (text) An enumerated field which can have values of R&D Complete, Draft Review, Candidate Review, Recommended, and Sunset.
reports (int array) This isn’t a finalized column and the implementation details may change slightly but what this represents is a need for the table to explicitly define which Test Plan Report(s) are attached to a TestPlanVersion. This would represent a one-to-many relationship with the TestPlanReport table

TestPlanReport

As mentioned above, some of the columns in this table are not properties of a test plan report.

One property that appears to be missing is the status of the report. It is either in-progress or complete. A report can't be visible on the reports page or candidate review page unless the admin has marked it complete. Rather than a status column, we might want a completeAt column. If the timestamp is null, the report is in progress.

Column Comment
id (int) - PK
testPlanVersionId (int) - FK
atId (int) - FK How do we track AT version? Is it in testPlanRun.testResults?
browserId (int) - FK How do we track browser version? Is it in testPlanRun.testResults?
testPlanId (int) - FK Seems unnecessary because we have testPlanVersionID. Is this here to simplify queries?
phase (text) This is not a property of a report; it is a property of the testPlanVersion from which the report is generated.
vendorReviewStatus (text) This is not a property of a report; it is a property of the testPlanVersion from which the report is generated.
metrics (json)
createdAt (timestamp)
completeAt (timestamp) Recommend adding this to track the completion status of the report
howard-e commented 1 year ago

@mcking65 Thanks so much for the details and clarifications. And the scenarios were quite appreciated!

Given that https://github.com/w3c/aria-at/issues/950 (and also #648) was implemented after this issue, a lot the recommended changes and concerns based on assumptions have been addressed in a revised database proposal, which I have now described here. It's based on internal discussions as well as follow up discussions we've had around #648, so it is already being used with in-progress work to implement #648. And I can say it does seem to map very well with your recommendations, and scenarios described in this issue, which is great!

Definitely agree that an events table could be very useful. We would probably need a separate table defining the types of events, e.g., eventTypes.

Noted. Will create a follow up issue to track the implementation of this feature.

Test Plan Group It appears to me that the data in this table belong in the test plan version table. Moving the following data to the TestPlanVersion entity would eliminate the need for the TestPlanGroup entity.

Agreed on this thought. We also arrived at the same conclusion that TestPlanGroup as a defined entity isn't required and can have its columns be moved into TestPlanVersion. So the revised TestPlanVersion table now describes the following columns as being added:

Additionally, we also added:

Also, with https://github.com/w3c/aria-at/issues/950 describing a clear separation of test plan versions, we also concluded that the reports column was also no longer needed to explicitly track reports. As this was previously based on the assumption that multiple gitsha-separated test plan versions could be referenced as a single "group".

TestPlanVersion In addition to the data that is in TestPlanGroup,there are some other essential data elements missing:

  1. covered AT: Each version of a test plan covers a specific list of AT. Perhaps that is in the metadata json? If so, that is probably sufficient for now. In the future, we may have queries that have AT conditions in the where clause, so that data might need to be elivated to a column.

The proposed implementation on this hasn't been finalized as but having it as part of the metadata has been considered I believe, and more details should follow in the coming days, with your future functionality thought also considered.

  1. DraftReviewStartDate

This has been added as draftPhaseReachedAt.

  1. SunsetDate

This has been added as archivedAt.

  1. CurrentPhase

This has been added as phase.

  1. Vendor review status (vendors review specific versions of a test plan, not a report. The vendors have report data because it is an aid to reviewing the test plan.)

I have a follow up here for my own clarification, but it may also be due to a consequence of terms defined in the application being different when compared with the WM.

Isn't the vendor review done against the individual pairing of an AT to a collection of test plan run results (which the app's TestPlanReport entity is a collection of). So this means JAWS/Alert V2 and NVDA/Alert V2 could be reviewed individually by their respective vendor reps, which is why I think this was originally included on TestPlanReport since it holds a reference to an AT through atId.

But this recommendation seems to imply that the entire Alert V2 (covering all ATs) could be reviewed at once. Is that intentional? Is there a change to the vendor review process or have I misunderstood this recommendation?

I believe this table should be updated to have the following columns.

updatedAt (timestamp) Consider ImportedAt - Time when the test plan version was imported

I agree with this name change being more appropriate.

draftReviewStartedAt (timestamp) - The time the test plan version was advanced to the draft review phase

This was added as draftPhaseReachedAt for consistency with the other phase related date columns on this entity.

sunsetAt (timestamp) - The time the test plan version was sunset.

This was added as archivedAt. Would you prefer sunset to be the consistent term we use to refer to this state?

CurrentPhase (text) - An enumerated field which can have values of R&D Complete, Draft Review, Candidate Review, Recommended, and Sunset.

This was added as phase. Additionally, I'm wondering if having a defined SUNSET phase is required. Do you think there's any chance in the future, that knowing the phase in which a TestPlanVersion was "sunsetted" may become relevant? Having the sunsetAt column being in a non-null state would imply that sunsetting has happened, to help support such a requirement.

TestPlanReport As mentioned above, some of the columns in this table are not properties of a test plan report.

One property that appears to be missing is the status of the report. It is either in-progress or complete. A report can't be visible on the reports page or candidate review page unless the admin has marked it complete. Rather than a status column, we might want a completeAt column. If the timestamp is null, the report is in progress.

atId (int) - FK How do we track AT version? Is it in testPlanRun.testResults?

Yes, that's correct.

browserId (int) - FK How do we track browser version? Is it in testPlanRun.testResults?

Yes, that's correct.

testPlanId (int) - FK - Seems unnecessary because we have testPlanVersionID. Is this here to simplify queries?

Indeed! We arrived at the same conclusion so this column has been removed (as well the resulting table relationship)

completeAt (timestamp) - Recommend adding this to track the completion status of the report

We added a column called markedReadyDate for this reason, but this naming is preferred.

Thanks again for this detailed review and comments. I also don't think the answers to any of my questions should cause any major blockers for us to continue current implementation efforts of #648, as we already seem closely aligned!

mcking65 commented 1 year ago

@mcking65 wrote:

  1. Vendor review status (vendors review specific versions of a test plan, not a report. The vendors have report data because it is an aid to reviewing the test plan.)

@howard-e wrote:

I have a follow up here for my own clarification, but it may also be due to a consequence of terms defined in the application being different when compared with the WM.

Isn't the vendor review done against the individual pairing of an AT to a collection of test plan run results (which the app's TestPlanReport entity is a collection of).

The vendor is reviewing and approving the plan, not the results. We provide the results to the vendor as part of the plan review because it is much easier for the vendor to review a plan if they know what results the plan produces when run with their product. It helps make sure they understand the meaning of the words in the plan.

So this means JAWS/Alert V2 and NVDA/Alert V2 could be reviewed individually by their respective vendor reps,

Yes, but more specifically, When the JAWS rep reviews alert V2, they are saying they agree with the alert V2 plan. That means they agree with:

  1. tests that are in the alert V2 plan.
  2. The "JAWS iplementation of the tests", which is the mapping of JAWS commands to assertions that are in the alert V2 plan.

The tests are the same for all vendors. We are seeking consensus/approval on the tests as written in the Alert V2 test plan.

Similarly, we want the NVDA rep to approve the tests and the NVDA implementation of those tests. If we made a change to the NVDA implementation that did not impact the test itself, e.g., we added or removed a command, we would not need to seek approval from other vendors for that change.

When we have approval from a sufficient number of AT vendors and no objections, we advance the test plan to recommended, which means it represents industry expectations for screen reader behaviors for that pattern.

This means that we can use Alert V2 plan to generate reports for future versions of JAWS, and Vispero trusts that the resulting reports will accurately reflect the support level provided by those versions. So, if a future report shows more failures, those failures would represent JAWS bugs. Vispero will not "approve" any of the reports -- they approved the plan that generates them.

Essentially, the plan is like a spec, and the vendors approve a version of the spec and the way we mapped the spec requirements to the functions of their products. The entire spec moves through the process. We don't separately move vendor-specific pieces of the spec through the process. This is because the goal is to build a single spec for each pattern that represents consensus across the industry.

which is why I think this was originally included on TestPlanReport since it holds a reference to an AT through atId.

Understandable, but the relationship we want is between AT vendor review events and the test plan version.

Note that:

  1. a vendor may review the test plan version when it has reports from only one browser. Later, when that test plan version is recommended, there could be reports generated for three or four browsers using that same screen reader and test plan. Because the test plan version is recommended, all the reports from any browser will be labeled "Approved".
  2. The test admins may decide to advance a test plan version to recommended even if it does not have approval from all covered AT. For example, it may have approval from Vispero and Apple but not NVAccess. This is not ideal, but it is likely, and the working mode permits it. Because that test plan version is recommended, all reports, including reports on NVDA, will be labeled "Approved".

Net: Phases, reviews, and and approvals apply to test plan versions. A test plan version should be associated with an array of "AT vendor review events" that represent all the review activity for that version of the test plan.

Whether a report is rendered with the warning "Unapproved" or the notice "Approved" depends on the current phase of the test plan version from which it was generated. When a test plan version moves from candidate to recommended, all reports generated from that version of the plan that were previously rendered with the "Unapproved" warning will need to be rendered with the "Approved" notice text. In addition, all new reports generated from that version of the test plan will be rendered with the "Approved" notice text.

For example, the button support table in APG is currently rendered with the "Unapproved report" warning. It has data for Chrome, Firefox, and Safari. Soon it will advance to recommended and the warning will be replaced with the notice that explains the report is "Approved", i.e., recommended. When we later add reports for Microsoft Edge, those will also have the "Approved" notice text because that version of the test plan is recommended.

@howard-e asked:

But this recommendation seems to imply that the entire Alert V2 (covering all ATs) could be reviewed at once. Is that intentional? Is there a change to the vendor review process or have I misunderstood this recommendation?

It has always been the case that when the Alert V2 plan advances to recommended, it is The entire V2 plan, covering all at in the plan. It doesn't advance until we have approvals from enough vendors to represent industry consensus (in the judgment of the community group).

Even though each vendor is reviewing the entire V2 plan, they probably don't care that much about how the commands of other vendor's products are mapped to assertions. That said, they do sometimes look at how the other products are tested, and if they observe inconsistencies, vendor A could question how the plan specifies that vendor B's product should be tested.

@mcking65 wrote:

sunsetAt (timestamp) - The time the test plan version was sunset.

@howard-e asked:

This was added as archivedAt. Would you prefer sunset to be the consistent term we use to refer to this state?

Thinking about this further, I think the term we should use in the working mode is "deprecated" or "superseded". This would more accurately reflect what is happening, V2 causes V1 to be deprecated. If we want to temporarily understand that the data model word for "deprecated" is "archived", that is OK. However, we should have an issue that tracks updating that data element name to avoid confusion and future programming errors.

I will comment on w3c/aria-at#950 to record a preference for deprecated over sunset.

@mcking65 wrote:

CurrentPhase (text) - An enumerated field which can have values of R&D Complete, Draft Review, Candidate Review, Recommended, and Sunset.

@howard-e asked:

This was added as phase. Additionally, I'm wondering if having a defined SUNSET phase is required. Do you think there's any chance in the future, that knowing the phase in which a TestPlanVersion was "sunsetted" may become relevant? Having the sunsetAt column being in a non-null state would imply that sunsetting has happened, to help support such a requirement.

Yes, I believe a deprecated phase is important. We could figure out the sequence because if a plan version is deprecated when in draft, it will not have a candidate phase start date, but it will have a deprecated date.

@mcking65 wrote:

completeAt (timestamp) - Recommend adding this to track the completion status of the report

@howard-e wrote:

We added a column called markedReadyDate for this reason, but this naming is preferred.

I am good with "CompleteAt". Note however, in the discussion of UI in 709 we discussed marking the report final. I believe @jugglinmike had cautioned against using "complete" for this purpose as it may be confused with complete in another context; I don't recall the context. I don't know how critical it is for the word used in the UI and the word used in the data model to be aligned in this case. I am fine with either direction if the team decides alignment is important:

A decision should be made as this term might also need to be defined in the working mode glossary.

mcking65 commented 1 year ago

@howard-e

I was just thinking of another way to explain of how reports are not related to candidate phase.

We could have designed the candidate test review page without any reports. We could have said to the AT vendors, "here is a plan for how we will test to see which screen readers support the button pattern. Tell us if it is a good plan."

We could have showed them the list of tests, the assertions for each test, and the screen reader commands we would use when running each test. We would tell them that these tests apply to any browser.

Pretend for a moment that all screen readers have exactly the same set of commands. It would be kind of like a group of JavaScript engines that all support the same methods. Several vendors look at the same plan and tell us that we have specified the correct set of behaviors and ways of testing those behaviors. They are not looking at any reports; they are just looking at what we test and how we test.

Once enough vendors approve, we know we have a good plan. Now, we move to the recommended phase and start generating reports. Those reports are "approved" as accurate because the test plan was "approved".

Hopefully this helps.

Looking at V2 of the proposal, this is the only place where I see a problem with how the model represents the process.

I am inclined to suggest that there be a table called candidateReview with columns for:

Event could be enumeration including:

Note that if there are multiple reps, one might start the review and another could finish or approve. Any could raise issues. Vispero currently has two reps with access.

This would allow for other types of events in the future.

The URI could capture the URI of issues raised.