Exposing Implementation Status

fantasai commented 7 years ago

There was a discussion in Tokyo about CanIUse panels that was supposed to migrate out into a conversation on how to better document and expose implementation status data. This thread is to start that conversation.

Some key points out of the discussion were

Microsoft has been making an effort to keep MDN updated wrt its implementation
MDN doesn't have an API to expose this information, but it could
Google is working on a tool to expose implementation data that can be gleaned from walking the object graph; this will work for many features, though not all.

I think some of the key questions here are:

What is stored? CanIUse has implemented/not implemented/partially implemented + notes per release per feature, which seems useful, but then what is a "feature"?
How is it updated?
Where are we going agree to collect this data? (Wherever that is, it needs an API.)

AmeliaBR commented 7 years ago

When the Web Platform Docs project was active, there was work done on importing MDN implementation data. I'm not sure whether it was a one-time import or an API, but I know it involved converting all the MDN tables into JSON, with some efforts at clean-up and standardization. I know the plan was that the WPD data would be available via an API. The code probably still exists in an abandoned GitHub repo, if someone wants to go looking for it.

The longer term goal of that project was to integrate full Web Platform Tests data into the support tables. But I don't think work on that got very far.

Also, some general thoughts from issues that came up during that project:

"support" for a feature is not a very precise term. Different references use different levels of granularity.
- CanIUse looks at major features as a whole. Many CanIUse tables equate to a complete CSS spec, or a large portion of it. The tables usually warn when key functionality is missing or there are major bugs, but they don't look at all the little edge cases and interactions. There's no easy way to query the data to find a specific sub-feature's support level.
- MDN looks at individual objects in the language. For CSS, that's mostly individual properties. Differences in support for particular values on a property are noted in sub-tables. But again, you're not going to have a lot of data about edge cases and bugs.
- Tree-walking and other ways of identifying whether language objects are recognized in the browser (e.g. whether the parser recognizes a CSS property/value or whether the JS global environment has a particular object declared) won't test whether the functionality is implemented correctly and completely.
- Spec tests (e.g. WPT) are much more fine-grained. But the data can be much more difficult to interpret. What does x% test fail mean, for practical developer use? To really be useful, you need to be able to map tests to spec features, and identify which tests are testing core functionality and which are testing edge cases, and then create summary statistics.
Human-curated data (e.g., CanIUse and MDN) can get out of date, or be incomplete, with no easy way to identify the problems except to have another human being review it carefully. Depending on who contributed the data, they may have made more or less effort to test edge cases, bugs, or interactions with other features.

SebastianZ commented 7 years ago

Note that MDN already started to turn its browser compatibility data into a JSON format. This project is maintained on GitHub:

https://github.com/mdn/browser-compat-data/

This project is still at the beginning, though its target is to store all implementation information available on MDN in the end.

Sebastian

On 27 May 2017 at 01:42, Amelia Bellamy-Royds notifications@github.com wrote:

When the Web Platform Docs https://www.webplatform.org/docs/Main_Page/index.html project was active, there was work done on importing MDN implementation data. I'm not sure whether it was a one-time import or an API, but I know it involved converting all the MDN tables into JSON, with some efforts at clean-up and standardization. I know the plan was that the WPD data would be available via an API. The code probably still exists in an abandoned GitHub repo, if someone wants to go looking for it.

The longer term goal of that project was to integrate full Web Platform Tests data into the support tables. But I don't think work on that got very far.

Also, some general thoughts from issues that came up during that project:

-

"support" for a feature is not a very precise term. Different references use different levels of granularity.

CanIUse looks at major features as a whole. Many CanIUse tables equate to a complete CSS spec, or a large portion of it. The tables usually warn when key functionality is missing or there are major bugs, but they don't look at all the little edge cases and interactions. There's no easy way to query the data to find a specific sub-feature's support level.

MDN looks at individual objects in the language. For CSS, that's mostly individual properties. Differences in support for particular values on a property are noted in sub-tables. But again, you're not going to have a lot of data about edge cases and bugs.

Tree-walking and other ways of identifying whether language objects are recognized in the browser (e.g. whether the parser recognizes a CSS property/value or whether the JS global environment has a particular object declared) won't test whether the functionality is implemented correctly and completely.

Spec tests (e.g. WPT) are much more fine-grained. But the data can be much more difficult to interpret. What does x% test fail mean, for practical developer use? To really be useful, you need to be able to map tests to spec features, and identify which tests are testing core functionality and which are testing edge cases, and then create summary statistics.

Human-curated data (e.g., CanIUse and MDN) can get out of date, or be incomplete, with no easy way to identify the problems except to have another human being review it carefully. Depending on who contributed the data, they may have made more or less effort to test edge cases, bugs, or interactions with other features.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/csswg-drafts/issues/1468#issuecomment-304409591, or mute the thread https://github.com/notifications/unsubscribe-auth/AA6h31K4jMZvI796sUYVCaNfOtso-NAQks5r92NzgaJpZM4NoIzE .

gregwhitworth commented 7 years ago

Sorry for the delay in my response to this.

I didn't have the chance to make the meeting in Tokyo, but I'll just put out my desire for this - I actually stated it to @astearns a while back in Seattle. Many UAs are testing the W3C WPT test suites in house and as a WG we resolved to move our CSS tests to that repo. That is now complete thanks to @gsnedders work.

I desire for the CSSWG test suite to be the trusted source on support. This will leave no room for ambiguity or personal opinion (we have instances of this from various testing for score sites) since we currently have a consensus driven model. That said, that means that the source is only as good as our test suite. So we need to hunker down and shore up the test suites. I think we could start white-listing suites that we consider stable as we review them and keep them up to date. Then CanIuse, MDN, etc can do whatever color coding - what have you based on the pass/fail rates of a test suite. Additionally, since we're moving to test driven spec work, UAs will know what they have/haven't implemented from a given spec easier since they'll only need to look to the failing test cases (and obviously help with regression testing).

This is my personal desire, and I think one that will serve all UAs, but it does require us to get our house in order.

gsnedders commented 7 years ago

I'm strongly against @gregwhitworth's proposal: we've seen vendor's marketing departments start to control their contributions to testsuites previously (releasing thousands of very shallow tests that are of very little value to increase their pass percentage, very selectively releasing tests they pass and competitors fail, etc.), and it's already true in many specs that colour coding based on pass percentage is relatively meaningless (some specs have the majority of their tests testing edge-cases and hence caniuse/mdn/etc. would probably be willing to consider a low percentage as support, whereas some have a fairly shallow testsuite where you'd want a higher percentage bar).

Given various people have proposed such things around this, I've finally got around to writing https://github.com/w3c/web-platform-tests/pull/6543 to try and give some proper place we can point to in future as to why we want to avoid anything that might lead to gamification of the testsuite.

gregwhitworth commented 7 years ago

@gsnedders That is definitely one way to look at it but what I'm proposing is to have a solid test suite - shallow test cases !== a solid test suite. You can be against providing an API to CanIuse or similar benchmarking site. Please don't tie the two of them together. I too am against gaming the system and that is why I want, and I think most of us are moving towards this, for WPT to be THE test suite. In order for that to be the case, none of us want to be running numerous pointless tests. We want interop and to be fixing bugs that provide value to our users.

With regard to the benchmarks, those benchmarks still exist whether you want them to exist or not and are not backed by any valid testing or oversight by those that know relative impact nor the specs in most scenarios. So while yes, there is potential for a team to try to game the system, if the WPT == the suite we run during building, there will be pushback to allow frivolous tests in. I appreciate your concern however. Please understand that my number one desire is a solid suite. Second to that is to provide accurate and unbiased support results (this is not the case currently).

AmeliaBR commented 7 years ago

I agree that "percentage tests passed" is a very poor measure on a spec-wide basis, and isn't perfect even when divided up per-property or per-section. The only meaningful information from test percentage is "all tests pass", "no support", or "some support but it's either incomplete or buggy".

For spec-level support, you really need curated data, whether that is from a 3rd-party (canIuse, MDN) or whether it is compiled by working group members directly.

We're supposed to be keeping track of implementation status for the W3C process, anyway. It could be useful to annotate the spec in response: as implementations come in that would meet the W3C criteria, mark the relevant sections as "implemented in X rendering engine", with an "except for feature Q" note where required. The data would be based on the test suite results, but someone who is familiar with the tests would be converting that into human-readable summaries.

(And if this is done, of course it would be nice to use some sort of metadata format which can easily be exported as a data file for other websites/tools to use.)

gregwhitworth commented 7 years ago

Yeah, I think % is just the start - it isn't the end all be all. @gsnedders has a good point that you could pass 90% but you could fail a very important one. There is nothing stopping any site from superseding the results that we pass in, but it is a great foundation to build on top of. It's definitely better than people cherry picking which technologies they want to investigate and then score, at least IMO.

tabatkins commented 7 years ago

and it's already true in many specs that colour coding based on pass percentage is relatively meaningless (some specs have the majority of their tests testing edge-cases and hence caniuse/mdn/etc. would probably be willing to consider a low percentage as support, whereas some have a fairly shallow testsuite where you'd want a higher percentage bar).

Note that CanIUse is not based on percentages; it uses hand-managed statuses for each browser/feature.

w3c / csswg-drafts

Exposing Implementation Status #1468