Clarify stage 3 entrance criteria

My original vision of stage 3 was aligned with that of TC39. There, it basically means "finished, pending editorial nit review---but since multiple implementations haven't happened yet, there's a reasonable chance that we'll discover something is broken, and need to fix the normative content". "Something is broken" is usually some small edge case, but very rarely could be something fundamental.

Examples of normative issues found during stage 3 in TC39 include:

For explicit resource management proposal, it was discovered during implementation that certain operations were specced to require 3 awaits (i.e. three microtasks). An alternate specification strategy was found that only required 1. https://github.com/tc39/proposal-explicit-resource-management/pull/219
For the explicit resource management proposal, an obvious oversight was identified during stage 3: essentially, some boilerplate that all new JS object specs have was omitted, and then added back when somebody noticed. This is borderline editorial but technically normative. https://github.com/tc39/proposal-explicit-resource-management/pull/167
For JSON parsing with source text, a bug was found in the spec algorithm for parsing negative numbers, and fixed: https://github.com/tc39/proposal-json-parse-with-source/issues/44
For the source phase imports proposal, an error type was changed from ReferenceError to SyntaxError for consistency. https://github.com/tc39/proposal-source-phase-imports/issues/49
For the base64 decoding/encoding proposal, a new feature to omit padding was added after web developer feedback came through: https://github.com/tc39/proposal-arraybuffer-base64/issues/59. (This is a bit unusual IME.)
The Temporal proposal is very large and so has encountered many small bugs during stage 3 due to implementer feedback. Examples include: "DST disambiguation happened at the month/day boundary, but should have been ignored"; several edge-case rounding bugs, when combining multiple options in unanticipated ways; "TZDB corner case in calculating the start-of-day of March 31, 1919 in Ontario, Canada", allowing weekOfYear and yearOfWeek to be optional when creating custom calendars; buggy results for since() and until() near the end of the month; etc. You can find samples with great explanations by going through various TC39 agendas (example) and Ctrl+Fing for the Temporal slides (example direct link).
The Temporal proposal got more serious implementer feedback that it was too large and this was causing implementation strain (e.g., too many objects added to new JS contexts). This resulted in a large-for-stage-3 scope reduction change to remove custom calendars and 1/3 of the convenience methods. Slide discussion.
The Shadow Realms proposal was promoted to stage 3, but the process of integrating it with the web platform found enough significant issues that the feature's current design was dubbed at risk, and it was demoted to stage 2. (These demotions are quite rare; I think it's happened 1-3 times ever.)

I think the TC39 standard is still a good one to apply to WHATWG stage 3. However, how that works in practice is affected by at least the following important differences between the TC39 process and ours:

The TC39 process involves much less early implementation experience than typical WHATWG features. In TC39, typically implementations hold off on even prototype code until stage 3. In the WHATWG, we usually see one implementer prototyping as early as stage 1, and definitely during stage 2.
The TC39 process requires two full implementations, ideally both shipping, to graduate from stage 3 to stage 4. In the WHATWG, we only require meeting the WHATWG working mode requirements, i.e. must have two implementers "in support", should have a prototype implementation, and should be no strong objections.
In TC39, the test-writing process is done by separate groups than the implementers. This has led to the introduction of "stage 2.7", if I understand correctly. Let's not discuss that today.

In practice, this means that in TC39 there are two long stages: stage 2, when the committee is both debating whether the problem is worth solving, and writing spec text that is as-good-as-possible without any implementer experience; and stage 3, where a lot of time is spent waiting for implementations and responding to implementer feedback.

I think if we keep the TC39 standard for stage 3 in the WHATWG, the differences between the groups would manifest in stage 2 being the long one at the WHATWG, and stage 3 being pretty short. Because during stage 2 we'd not only resolve the "do we want to do this" question, but also get all the feedback from the initial implementation. And we wouldn't need to wait for the second implementation to graduate from 3 to 4. So for the WHATWG, stage 3 would last roughly as long as it takes the editor to finish their review.

Note that in both TC39 and the WHATWG stages, the existence of stage 4 is basically a formality, because in both cases the feature is "finished" according to the process. So there is a sense in which stage 4 is pointless, and we could as well denote "done with the stages process" by "no stage".

Responding to specific points:

Complete specification text" and "The solution is complete and no further work is possible without implementation experience, significant usage and external feedback". If this is the entrance criteria and what the stage signifies, then it becomes unclear what is the difference between stage 3 and stage 4.

In my proposed vision, the difference is just that in stage 4, the spec is merged and has finished editor review. This editor review could be lengthy, especially if the feature is large and/or the contributor is new to WHATWG spec convenentions, but it will usually be short.

It's also possible that editor review will identify normative changes, not just editorial nits. They'll usually be small, especially if a prototype implementation already exists. The kind of thing I'm thinking of is consistency suggestions, e.g. which types of error to throw, or property naming, or similar, which weren't caught by anyone earlier in the process.

Finally, our process allows for cases where there's no prototype implementation started until stage 3, in which case we might see larger changes. I'm not sure we'll see this much, as for small features that are easy to implement "in one go" people probably won't need the stages process, whereas for large features people will probably prefer to start a prototype implementation early. But I think it's reasonable to allow it.

"Full specification and comprehensive tests are completed". I suspect that the intention here is for the tests to exist, but not necessarily to not need any improvements based on stage 3 feedback and implementation experience. It would be nice to reword this, if that is the case (maybe "available" instead of "completed").

I agree some caveating here would be helpful, to express that tests may need to be revised in response to stage 3 feedback.

whatwg / meta

Clarify stage 3 entrance criteria #336