Market research: would a mapping from APIs to tests be valuable?

foolip commented 6 years ago

Hi all, this seems like a fair place to catch spec editors who are keen on testing things.

@mdittmer and I are brainstorming ideas, and we have one that seems promising. Roughly:

Patch Blink's bindings generator to log which Web IDL methods/attributes/etc. are being hit at runtime (kinda like use counters for everything, but for local use)
Run each test in WPT against that build individually and save the results
Parse the Web IDL of all specs (soon here) to figure out which APIs belong to which specs
Combine to get one of:
- Lookup of single API to which tests hit that API
- List of all APIs in a spec and how many times they are hit when running all tests

We think that the former is neat but not important, and that the second could be useful as a guide to what is totally untested and possibly undertested, by eyeballing it.

Would any of this be immediately useful for spec maintainers? For anyone else?

We could use the same stuff to figure out what tests are relevant to a page on MDN, but how to leverage that isn't obvious.

@tabatkins FYI that we're thinking about this, complementary to manual fine-grained test linkage using Bikeshed.

annevk commented 6 years ago

Lookup of single API to which tests hit that API

This is actually quite useful as whenever we want to change something we need to have this list. I typically resort to grep.

domenic commented 6 years ago

The former is quite nice in knowing, when it's time to update the spec for a thing, what tests you should potentially update. I'm sure I've made spec changes which have broken existing tests I didn't know about.

Although, I guess this will be pretty useless for core DOM APIs that all tests use. E.g. tons of iframe APIs are going to be hit for all multi-global tests, even if they're testing multi-globals in the XHR API or something.

foolip commented 6 years ago

Yep, some APIs will show up everywhere, but then those are probably not the kinds of APIs we like to change the most, document.body is going to stay as it is :)

For the "Lookup of single API to which tests hit that API" case, would this be for APIs with very grep-unfriendly names? I would definitely reach first for grep in a case like node.querySelectorAll(), but maybe useful for https://dom.spec.whatwg.org/#dom-elementcreationoptions-is and such?

How would the information have to made available for it to be less work than using grep? Command line? Web property? Inline in spec? JSON file?

rtoy commented 6 years ago

I like have counters for everything. But what do you mean "for local use"? The counters don't work in the wild? But I like the automatic part because I've learned that some of the counters we added for WebAudio don't actually count what we thought it did.

It would be great for testing purposes; for existing attributes in WebAudio, we have discovered that some weren't tested. Sometimes we forget to test completely the values and other such things. Having anything automatic to tell us we're missing things would be totally awesome.

foolip commented 6 years ago

@rtoy, by "for local use", I mean that builds we use to collect this data would most likely be custom builds on a machine dedicated to collecting this data, not vanilla Chromium builds. This is because we would effectively have to add "use counters" for all APIs exposed using Web IDL, which would simply not fly for performance reasons.

So, very concretely, I think this would be:

An off-by-default build flag that causes the bindings generator to add some form of logging for every method, attribute, operation, dictionary member, and so on, very much like [Measure] but for everything.
Run each test in web-platform-tests individually with that build, to associate a single test with a list of code paths exercised.
Do something useful with the data. Whether it would be useful is the question of this issue.

An older idea that's been floating around is to do full code coverage builds in the same way, which would be a superset of this, much more data, but an even bigger undertaking. (Idea being that we could go look at code we think is well covered, to see if actually there are gaping holes, i.e. going from 30% to 90%, not 95% to 99%, that I don't think is a good return for time spent.)

mdittmer commented 6 years ago

Run each test in web-platform-tests individually with that build, to associate a single test with a list of code paths exercised.

Do tests <--> current-browser-URL have a neat mapping? If so, we might be able to avoid the significant overhead of setting up and tearing down the test harness for each test individually, include the URL in the custom logs, and map URLs to tests during log analysis.

Would that work?

foolip commented 6 years ago

I think there should be ways to avoid restarting the browser by logging the test URL at the right points, but it'd take some work to convince ourselves that there's no chance of raciness, i.e., that all logging used comes from a single thread and can't arrive out of order, which is generally not the case.

mdittmer commented 6 years ago

Can't the URL be captured as soon as there is intent-to-log, rather than when the log gets committed?

On Mon, Mar 12, 2018 at 11:18 PM, Philip Jägenstedt < notifications@github.com> wrote:

I think there should be ways to avoid restarting the browser by logging the test URL at the right points, but it'd take some work to convince ourselves that there's no chance of raciness, i.e., that all logging used comes from a single thread and can't arrive out of order, which is generally not the case.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/whatwg/meta/issues/77#issuecomment-372533405, or mute the thread https://github.com/notifications/unsubscribe-auth/ABsWSLtes0smZrPFy6tg6RbPJPgYPxZrks5tdzqHgaJpZM4SiO0K .

foolip commented 6 years ago

@mdittmer, I'm not sure exactly, the code paths to log would be spread across renderer processes because of OOPIF, and the current URL is ultimately something that changes in the browser process first and then reaches a new or reused render processes, so having confidence in the non-raciness of any logging setup is hard. Most straightforward way is probably to start by running the tests one by one, and to validate any optimization by comparing to those results.

foolip commented 6 years ago

@plehegar and @tabatkins both told me that this might be most immediately useful to bootstrap spec→test linking, and that makes sense. It seems like it'd also easily reveal bits that are totally untested.

domfarolino commented 6 years ago

In general, I really like this idea, thanks for posting. It sounds really useful!

whatwg / meta

Market research: would a mapping from APIs to tests be valuable? #77