web-platform-tests / wpt

Test suites for Web platform specs — including WHATWG, W3C, and others
https://web-platform-tests.org/
Other
4.89k stars 3.06k forks source link

What is webaudio/resources/audit.js? #12781

Open Ms2ger opened 6 years ago

Ms2ger commented 6 years ago

https://github.com/web-platform-tests/wpt/blob/8aa4853c2a65dd73c30fbe9f156771abfb8669e2/webaudio/resources/audit.js

It seems like it unnecessarily re-invents parts of testharness.js. Why was it introduced? What makes plain testharness.js insufficient? Can we solve its use cases without introducing a completely different harness?

CC @hoch @padenot @foolip @jgraham @gsnedders

foolip commented 6 years ago

I plead ignorance. https://github.com/web-platform-tests/wpt/pull/12606 is the first time I came across it and just to unblock a Chromium export.

hoch commented 6 years ago

First, audit.js predates the adoption of testharness.js in Chromium project. Chrome WebAudio team has been using it for last 4 years.

It offers various utilities for testing/verifying audio data plus managing asynchronous tasks nicely. The majority of WebAudio tests in Chrome uses OfflineAudioContext and we really needed a nice way of dealing with multiple asynchronous tests. Most importantly, we do really care about the detailed failure description. The library has evolved around such needs and we will continue to use it in the future test.

If this has to be removed from the WPT suite for some reasons, Chrome WebAudio team will have to rewrite all the tests manually and it will take significant effort and time.

CC @rtoy

foolip commented 6 years ago

If this has to be removed from the WPT suite for some reasons, Chrome WebAudio team will have to rewrite all the tests manually and it will take significant effort and time.

I'm confident that won't be necessary. If there is something about audit.js that is problematic, then fixing that would also involve fixing the tests. Not to say that there is something problematic, but if you search for generate_tests you'll find a bunch of PRs to get rid of that method in various contexts, but generate_tests remains until someone gets rid of usage everywhere.

That being said, if there are things that are very convenient in audit.js but not with testharness.js that you find yourself missing when writing tests outside of webaudio/, learning from that and perhaps copying some features might be an option.

rtoy commented 6 years ago

That's really good to hear. I don't have a lot of experience with plain test_harness, but the best part about audit.js are the messages printed out for each test. With a bit of care, you can easily see what is being tested and whether the it makes sense. And in particular when a test fails, we've tried to make the output really verbose to make it easy to see why the test failed. Nothing sucks more than debugging a test that just says FAIL. :-(

foolip commented 6 years ago

Having a look at https://wpt.fyi/results/webaudio/the-audio-api/the-audiobuffer-interface/audiobuffer.html, it looks like the generated subtest names includes a lot of detail about the pass conditions. I can certainly see how that would come in handy in debugging.

The usual approach with testharness.js is to have more human readable test names, and to make the failure messages contain the required detail, although it's honestly easy to fail at that. Here's an example that I think is good, anyway: https://wpt.fyi/results/url/data-uri-fragment.html?complete=true

jgraham commented 4 years ago

So the generated test names in audit.js-using tests are something of a problem because they end up doing things like dumping arrays of floats in the test name, and those are both hard for a human to read, and not necessarily consistent on every run or platform. Test names are really part of the public API for testharness.js tests and are expected to be consistent between runs, platforms and browsers as far as possible. One consequence of this is the wpt.fyi output for these tests is messy and hard to read e.g. [1] where we see lots of "MISSING" tests on implementations that are actually passing everything. Another implication is that the metadata files Gecko uses to record which tests pass and fail are a mess and end up with entries that are hard to maintain.

I've proposed a patch at https://phabricator.services.mozilla.com/D89436 to make each task in the runner generate a single test and for all the asserts to live under that test rather than generating individual tests. I strongly feel like some variant of this is needed to make these tests work well as wpt, but I'm very open to changes to aid developing with the tests. For example I'm very happy for the tests to log to the console (we can add a lint exception) or into the test document, and for those logs to contain the extra information that was previously in the generated test names.

[1] https://wpt.fyi/results/webaudio/the-audio-api/the-analysernode-interface/ctor-analyser.html?label=master&label=experimental&aligned&q=webaudio

hoch commented 4 years ago

@rtoy I would love to have your thoughts on this.

rtoy commented 4 years ago

I think it's possible to change the default not to print out any part of the array.

But as one who writes a lot of tests, seeing the array output as part of the test has been really helpful. It's easy to make mistakes where the test passes, but when you see the array output, you can often tell that what you've tested isn't what you expected to test.

Tests are already hard to write and debug; I'd prefer to make it easier for tests to be written and debugged.

gsnedders commented 4 years ago

I think it's possible to change the default not to print out any part of the array.

Nobody is saying you can't print it out, just that to print it out it should be part of the assertion description rather than the test name. The test names are, as @jgraham says, part of the public API, whereas assertion descriptions are there to purely assist with debugging and no guarantee is given for stability between test runs or expected to be the same cross browser.

stephenmcgruer commented 3 years ago

Nobody is saying you can't print it out, just that to print it out it should be part of the assertion description rather than the test name.

I think the blocker for @rtoy and others is that a passing test doesn't contain any assertion outputs. As Roy said, they've found it easy to accidentally write a passing test that isn't testing what they expect. Per @foolip 's comment in https://github.com/web-platform-tests/wpt/issues/10201#issuecomment-500771960:

If you want extra information even when passing, I think what would fix it is additional per-subtest metadata that has no effect on passing or failing, but is shown on wpt.fyi and Chromium's baselines.

Without thinking it through very well, I wonder if we could achieve this today by extending things as follows:

  1. If test(...) returns a string value, that value is set as the tests message
  2. If promise_test(...) returns a Promise that resolves as a string value, that value is set as the tests message
  3. For async_test(...), change test.done() to take an optional message parameter, that is set as the tests message.

Then (in high-level theory), the testharnessreport implementations could show or ignore this information as they like?

jgraham commented 3 years ago

We're already storing each step that the test takes. We can also figure out which test is running for each assert since only one test can be running at a time (no multithreading etc.). We can therefore log all the steps that ran and all the asserts that they ran. We could store this and make it possible to view it as part of the output (probably constructing the actual message lazilly for performance reasons when we have 10,000 subtests). That would solve the problem with no test changes required.

Alternative solution: teach wptrunner (with a command line flag) to set log points on relevant testharness.js functions and use that to audit the test run. It's browser specific but it's also a pretty powerful technique since it can give us inspectable objects in the console rather than string serializations.

rtoy commented 3 years ago

I've been working with @padenot to see what we can do about this, since they have a change for audit.js they're interested in. No news yet on the results.

stephenmcgruer commented 3 years ago

Note that as of https://github.com/web-platform-tests/wpt/pull/27224 testharness.js will now show the list of passed asserts and the LHS/RHS for each test in the visual output. There are some caveats:

  1. It doesn't function for worker tests.
  2. Due to significant performance regressions, it is disabled in the Chromium CI.

If it seems like this feature could be useful in deprecating audit.js, then I think the above two problems are addressable (or we should at least try!). I'd be interested in folks thoughts.

To give this a try:

$ ./wpt run --channel=dev chrome html/semantics/embedded-content/media-elements/track/track-element/track-element-src-change.html

jgraham commented 3 years ago

The worker thing is definitely a bug that we can fix. The fact that it's turned off in CI reflects the fact that the set of asserts that run isn't part of the API so the intent of the feature is to allow people to verify that the test is doing what they expect and help with debugging, not to change the overall design of testharness. As I understand it that's the requirement that led to audit.js, but if there's additional functionality that would help here I'd be really interested to understand the use cases.

rtoy commented 3 years ago

For some historical perspective, this was originally written to use Chrome's old jstest harness that produced fairly verbose output so we kind of matched that output. To update to wpt testharness, we just changed the call to the jstest routine to an testharness assert. (I forget the exact details.)

I've always hated tests that say "foo passed" with no other information. What passed? What was actually tested? audit.js helps with that at the expense of more verbose output (because the should() method produces more verbose output). I've also run into an imported test that passed on Chrome but didn't actually test what what was expected. So I think it's really important to provide details on what's being tested so it's easier tell if the expected output makes sense. This is a hard tradeoff to balance.

jgraham commented 3 years ago

Providing more details on what was actually tested is what the new testharness.js feature is intended to do. It records all the asserts that are called, their location in the source, and the arguments they are called with.

The problem with audit.js is that it breaks the testharness API by putting variables into the test names. That makes it impossible to record expectations for specific tests or track them over time. For that reason we see a lot of CI problems (e.g. unexpected results) caused by tests using audit.js. We'd really like to remove it entirely but obviously don't want to make it harder for people to write good tests. Hence the work to replicate the benefits it provides, without the harmful implementation.

rtoy commented 3 years ago

The tests shouldn't be flaky in general, at least for the tests that use an OfflineAudioContext. They ought to be 100% repeatable with identical results each time. This isn't always true, unfortunately. Chrome has found some implementation bugs due to this.

But yes, not having the test results in the test names would be a good thing. Providing meaningful names for tests is hard, especially if there are a lot. Then people (like me) get lazy and start naming tests "test1", "test2",....

And regrettably, neither I or @padenot have worked on this in quite some time. I'll ask him again about this.

Although you can't see it in wpt.fyi, audit.js can produce a lot of data when some expectation fails. For example, many tests expect the actual output array to match the reference array to some tolerance. When this fails, parts of the both arrays are printed out and also the elements and values that exceeds the tolerance the most. Very helpful in figuring out what happened and if it's just a small round-off where we can just tweak the thresholds a bit.

jgraham commented 3 years ago

https://bugzilla.mozilla.org/show_bug.cgi?id=1660219 is a typical example of this causing problems in m-c, which just got reopened so I need to look (again) at what's needed to fix the expectation data.

If we want some way to add additional logging to assert messages, or to imply it from the specific assert function, that seems like quite a reasonable thing to put into testharness.js itself.

rtoy commented 3 years ago

Sorry about that. But that test was last changed in July 2020, so not sure what could have happened to make the test fail now. A quick glance at that test says the tests are failing basically because the thresholds are too tight.

When writing new tests, I usually try to test with Firefox and make sure Firefox passes the test or adjust the thresholds until it does. Perhaps I forgot to do that for that test.

Reducing the SNR thresholds to something smaller would work for me, if, indeed, something changed in Firefox to make the thresholds lower.

jgraham commented 3 years ago

Right, it might be possible to fix these tests on a case-by-case basis for Firefox. Maybe @padenot could take a look at that. Fundamentally the problem from my point of view is that the audit.js tests are violating the API contract of testharness, and that breaks other tooling which makes assumptions like "the names of tests don't change per run". So even though we might be able to fix some of the specific issues, I'd really like to agree on a way forward to make the webaudio tests conform to the web-platform-tests API without affecting the utility of the tests themselves.

rtoy commented 3 years ago

Yes, I agree.

However, if the values did change between versions, it's probably worth investigating why. Chrome has discovered some serious issues this way because V8 incorrectly changed some math routines. We've also uncovered unexpected changes in ffmpeg this way too. In this case, there wasn't much that could be done except update the tests.

gsnedders commented 3 years ago

Note that WebKit simply removes these strings from the names, see https://github.com/WebKit/WebKit/blob/a3bf81f011f1a93c6a466f51eadcf1a4d221457f/LayoutTests/resources/testharnessreport.js#L93 from https://bugs.webkit.org/show_bug.cgi?id=216332.

jgraham commented 2 years ago

I think this is now causing https://bugzilla.mozilla.org/show_bug.cgi?id=1751300