Ensure tests are clean - Githubissues

yoavweiss commented 6 years ago

@toddreifsteck @igrigorik - Do you have a reference or more context on this one? https://lists.w3.org/Archives/Public/public-web-perf/2018Jan/0032.html doesn't really mention performance-timeline. Otherwise, https://wpt.fyi/results/performance-timeline?label=stable&aligned=true seems relatively clean (with some PO tests failing everywhere)

Do you remember what this was about?

npm1 commented 5 years ago

It looks like this more about choosing the correct commit to pin L2 on, right? And it should be one without the buffered flag.

npm1 commented 5 years ago

Looking at the results from https://wpt.fyi/results/performance-timeline?label=master&label=experimental where there is no 2 passing implementations:

idlharness tests:
- supportedEntryTypes is only green in Chrome.
- PerformanceMark detail is missing in all (this is from UserTming L3 so can be ignored).
po-observe-type.any.html (and worker version): only green in Chrome because it tests the |type| parameter for observe() and throwing the errors specified in the spec.
po-observe.any.html: Chrome is green, Firefox is missing throwing an error when |entryTypes| and |type| are not provided to an observe() call, Safari has that and more problems with exception-throwing.
po-observe.html: can be ignored because it is testing the buffered flag.

@yoavweiss @toddreifsteck @rniwa @mstange

hawkinsw commented 5 years ago

W.r.t. Firefox, we are tracking implementation progress in https://bugzilla.mozilla.org/show_bug.cgi?id=1539006.

As an FYI, we wrote patches today that take care of a majority of our failures. Obviously it will take some additional time to finalize those patches and fix the outstanding failures, but I wanted to let everyone know that we are making progress. Looking forward to having a positive update at/before the meeting on Friday.

@mstange @yoavweiss

npm1 commented 5 years ago

Thanks for the update, it's great to hear this!

hawkinsw commented 5 years ago

FF implementation is complete. We will have to wait on the resolution of #119 and #117, though.

hawkinsw commented 5 years ago

@yoavweiss @npm1 @mstange As of tomorrow morning, these fixes should all be in FF nightly.

yoavweiss commented 5 years ago

That's awesome, thank you! :)

hawkinsw commented 5 years ago

Feel free to check on the status via wpt.fyi. It looks like FF is now on equal footing with Chrome! @mstange, @npm1, @yoavweiss

npm1 commented 5 years ago

Thanks for the great work! One question, in https://wpt.fyi/results/performance-timeline?label=master&label=experimental I see a Firefox failure in case-sensitivity.any.html which I think was not there before. Is this a recent regression?

hawkinsw commented 5 years ago

We are seeing the same thing here. I am investigating! Sorry!

hawkinsw commented 5 years ago

Thanks for the great work! One question, in https://wpt.fyi/results/performance-timeline?label=master&label=experimental I see a Firefox failure in case-sensitivity.any.html which I think was not there before. Is this a recent regression?

Two of us have been attempting to recreate this locally and cannot seem to do so. Would it be possible for you to transfer the nightly artifact that you are using for testing that shows the error? If so, that would be fantastic.

Sorry to have to bother you for this.

npm1 commented 5 years ago

It seems to be fixed now... maybe it was a glitch with wpt.fyi... In any case, I don't know how to do that but @Hexcles might know.

hawkinsw commented 5 years ago

Thanks for the update. I noticed it was clean, too. However, we believe that it is an intermittent failure. So, until we know what's going on, I am going to keep investigating. We know when it started (mid-December) but not yet why.

In the meantime, with that exception, are you satisfied with the FF implementation?

On Wed, Apr 17, 2019 at 4:01 PM npm1 notifications@github.com wrote:

It seems to be fixed now... maybe it was a glitch with wpt.fyi... In any case, I don't know how to do that but @Hexcles https://github.com/Hexcles might know.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/performance-timeline/issues/100#issuecomment-484239264, or mute the thread https://github.com/notifications/unsubscribe-auth/ACCP2CW5ET2POVPPWN5ALYTPQ56XXANCNFSM4ER7NV5Q .

npm1 commented 5 years ago

Yep, regardless of that we have two passing implementations for all tests. I'll wait until after I have fixed the two related issues before closing this, as that will likely affect tests.

yoavweiss commented 5 years ago

@npm1 - are we there yet? :D

npm1 commented 5 years ago

Ok, I'll close this for now then. We're still waiting for the resolution of https://github.com/w3c/performance-timeline/issues/117 but we can keep track of that there. Firefox is only failing idlharness checks for UserTiming-related interfaces so we can consider 2 implementations (Chrome and Firefox) to be green.

npm1 commented 5 years ago

Reopening this because tests are still not clean. Let's take a look at tests that do not have two passing implementations from https://wpt.fyi/results/performance-timeline?label=experimental&label=master&aligned

Ignoring buffered-flag tests for now, they shouldn't be needed for L2.
Case-sensitivity: I was going to change the test to use marks but noticed it might not be the problem here. @rniwa getEntriesByType('RESOURCE') should not return any entries due to the uppercase, is there a bug for this in Safari? @hawkinsw performance.getEntriesByName(origin + "/resources/testharness.js") should return only 1 entry. While we could change the test to assert there's at least 1 entry, I'm confused as to why Firefox is sometimes returning a list of length 2, could you look into this or is there a Firefox bug?
Idlharness: does Safari plan on implementing PerformanceObserver's takeRecords()? The Firefox failure seems due to UserTimingL3, not sure if there's a way to decouple the harness so we don't test something out of the scope of this specification though... @Hexcles is it possible to use performance.mark without testing the IDL shape of PerformanceMark in this test?
supportedEntryTypes: per feedback from Mozilla we need to cache the result, is there any objection to the test that's failing in Firefox and Safari (PerformanceObserver.supportedEntryTypes === PerformanceObserver.supportedEntryTypes) or does this just require some implementation change?

hawkinsw commented 5 years ago

Reopening this because tests are still not clean. Let's take a look at tests that do not have two passing implementations from https://wpt.fyi/results/performance-timeline?label=experimental&label=master&aligned

* Ignoring buffered-flag tests for now, they shouldn't be needed for L2.

* Case-sensitivity: I was going to change the test to use marks but noticed it might not be the problem here. @rniwa getEntriesByType('RESOURCE') should not return any entries due to the uppercase, is there a bug for this in Safari? @hawkinsw performance.getEntriesByName(origin + "/resources/testharness.js") should return only 1 entry. While we could change the test to assert there's at least 1 entry, I'm confused as to why Firefox is sometimes returning a list of length 2, could you look into this or is there a Firefox bug?

Interestingly enough, I am working on this bug as we speak. I think that I have found the source of the problem and now I just need to write the fix. I don't know yet exactly how long it will take, but I am on it. I hope that's okay!

* Idlharness: does Safari plan on implementing PerformanceObserver's takeRecords()? The Firefox failure seems due to UserTimingL3, not sure if there's a way to decouple the harness so we don't test something out of the scope of this specification though... @Hexcles is it possible to use performance.mark without testing the IDL shape of PerformanceMark in this test?

* supportedEntryTypes: per feedback from Mozilla we need to cache the result, is there any objection to the test that's failing in Firefox and Safari (PerformanceObserver.supportedEntryTypes === PerformanceObserver.supportedEntryTypes) or does this just require some implementation change?

npm1 commented 4 years ago

Looks like Firefox fixed case sensitivity. I think the only test remaining to claim L2 'clean tests' is supportedEntryTypes since both Firefox and Safari fail to return the same object on two successive calls of the attribute getter.

yoavweiss commented 4 years ago

@bdekoz @rniwa @achristensen07 - friendly ping! :)

The list of non-passing tests is:

[x] po-disconnect-removes-observed-types
[x] po-observe-repeated-type
[ ] supportedEntryTypes

There are also some IDL harness failures, but I think those are a result of lack of support for User Timing L3.

Could y'all verify that the fact that the tests are not passing is an issue with your implementation (rather than the tests) and open issues accordingly? Beyond that, would be great if fixing those issues could be prioritized :)

yoavweiss commented 4 years ago

There are also some IDL harness failures, but I think those are a result of lack of support for User Timing L3.

@foolip simplified the IDL tests to stop them from spuriously failing 🎉

sefeng211 commented 3 years ago

Thanks for the ping. We are scheduling work sessions to address these failures.

yoavweiss commented 3 years ago

@sefeng211 - looks like the supportedEntryTypes test is the only one left. Could y'all concentrate some effort to paint it green? :)

sefeng211 commented 3 years ago

Yeah, we are trying it green! :) Although this one requires a bit more work.....

yoavweiss commented 3 years ago

@sefeng211 - any news on that front?

sefeng211 commented 3 years ago

Here's the tracking bug. It's caused by a webidl feature that we haven't implemented yet.

I don't see anyone is going to work on that soon, so this test will probably going to remain as a failure for a while. However, I wouldn't consider this is a blocker preventing us from moving the spec to the next stage.

yoavweiss commented 2 years ago

Given that this is not a blocker, I think we can close this.

w3c / performance-timeline

Ensure tests are clean #100