webmachinelearning / webnn

🧠 Web Neural Network API
https://www.w3.org/TR/webnn/
Other
369 stars 46 forks source link

WPT tests tracker #265

Closed BruceDai closed 1 year ago

BruceDai commented 2 years ago

Thanks @fdwr's great efforts for reviewing and @Honry's approvals and helps, our WPT WebNN tests PRs have been all landed after previous blocker of syncing updated WebNN IDL interfaces on https://github.com/web-platform-tests/wpt/pull/36908 being resolved by my fixing CI failure PR.

Now there're such 432 WPT WebNN operations tests covered total 40 ops for first wave models after convTranspose2d tests https://github.com/web-platform-tests/wpt/pull/38100 being landed. We could run these tests on https://wpt.live/webnn/, eg:

Bruce is ongoing to add tests of remaining ops #338 closely co-working with @fdwr.

WPT WebNN Tests:

1. WebNN API IDL Tests:

2. WebNN API JavaScript Tests (testharness.js) for operations tests:

anssiko commented 2 years ago

@BruceDai, thanks for your contributions to conformance testing. I added webnn-baseline to today's agenda including discussion on ULP tolerances to unblock your work on this (I'm not expecting presentation, just discussion). The webnn-baseline is identified as a CR requirement, so high priority.

@wchao1115 @huningxin your feedback is welcome in this issue to unblock this proposed work. Since we have a busy agenda today, we may need to defer to GH discussion.

BruceDai commented 2 years ago

I'm sorry to report status late. According to testing ULP tolerances between actual output by WebNN operations with expected data/baseline by WebNN-Baseline on some different HW devices with WebNN-Native DML backend and OpenVINO backend, we observed there're majority ULP tolerances with normal input data and some large ULP distance with some special input data. Here I want to propose following majority ULP tolerances to WG.

@wchao1115 Please also take a look, and I hope that you would share your pervious operations ULP tolerances of DML, thanks.

Op Propose ULP Tolerance
batchNormalization 5
clamp 0
concat 0
conv2d 2
add 1
sub 1
mul 1
div 2
max 0
min 0
pow 3
abs 0
ceil 0
cos 2
exp 2
floor 0
log 3
neg 0
sin 2
tan 4
gemm 1
leakyRelu 1
matmul 1
averagepool2d 2
maxpool2d 0
relu 0
reduceMax 0
reduceMean 0
reduceMin 0
reduceProduct 0
reduceSum 0
reshape 0
sigmoid 2
slice 0
softmax 1
split 0
squeeze 0
tanh 2
transpose 0

Iβ€˜ve firstly submitted a PR https://github.com/web-platform-tests/wpt/pull/34287 of adding tests of 8 operations (clamp / concat / relu / reshape / slice / split / squeeze / transpose ) which have 0ULP distance between actual output with expected data/baseline.

huningxin commented 2 years ago

As this is related to wpt which is cr blocker #240, I propose to label this issue with "cr". @anssiko

BruceDai commented 2 years ago

Link to https://github.com/webmachinelearning/webnn/issues/288

anssiko commented 2 years ago

[Piggy-packing on this issue with a more generic w-p-t question.]

@BruceDai, could you give us an update on where we are in terms of test coverage for WebNN API w-p-t tests?

Our plan is to migrate the mocha tests to wpt/webnn to satisfy CR readiness criteria tracked in https://github.com/webmachinelearning/webnn/issues/240.

Looking at the relevant wpt PRs it looks like the migration is in progress.

Do you foresee other blockers besides ULP tolerances discussed in this issue? Thanks for your contributions to w-p-t!

BruceDai commented 2 years ago

Hi @anssiko, sorry for late response due to the holidays.

Current WebNN API Spec defines 56 operations, WebNN-Baseline has already implemented 42 first wave ops, and WebNN-Polyfill has implemented mostly of them (50/56 including 42 first wave ops). I'm starting to add operation level tests from above listed 8/42 first wave ops. Here's a implemented tests table, please have a look, thanks.

Operations \ tests WebNN-Baseline WebNN-Polyfill WPT Note (Is first wave operation?)
batchNormalization √ √ Γ— Yes
clamp √ √ √ Yes
concat √ √ √ Yes
conv2d √ √ Γ— Yes
convTranspose2d √(*) √ Γ— Yes
add √ √ Γ— Yes
sub √ √ Γ— Yes
mul √ √ Γ— Yes
div √ √ Γ— Yes
max √ √ Γ— Yes
min √ √ Γ— Yes
pow √ √ Γ— Yes
abs √ √ Γ— Yes
ceil √ √ Γ— Yes
cos √ √ Γ— Yes
exp √ √ Γ— Yes
floor √ √ Γ— Yes
log √ √ Γ— Yes
neg √ √ Γ— Yes
sin √ √ Γ— Yes
tan √ √ Γ— Yes
gemm √ √ Γ— Yes
gru √ √ Γ— Yes
gruCell √ √ Γ— Yes
hardSigmoid Γ— Γ— Γ— No
hardSwish Γ— √ Γ— No
instanceNormalization Γ— √ Γ— No
leakyRelu √ √ Γ— Yes
matmul √ √ Γ— Yes
linear Γ— Γ— Γ— No
pad Γ— √ Γ— No
averagepool2d √ √ Γ— Yes
maxpool2d √ √ Γ— Yes
l2Pool2d Γ— √ Γ— No
reduceL1 Γ— √ Γ— No
reduceL2 Γ— √ Γ— No
reduceLogSum Γ— Γ— Γ— No
reduceLogSumExp Γ— √ Γ— No
reduceMax √ √ Γ— Yes
reduceMean √ √ Γ— Yes
reduceMin √ √ Γ— Yes
reduceProduct √ √ Γ— Yes
reduceSum √ √ Γ— Yes
reduceSumSquare Γ— Γ— Γ— No
relu √ √ √ Yes
resample2d Γ— √ Γ— No
reshape √ √ √ Yes
sigmoid √ √ Γ— Yes
slice √ √ √ Yes
softmax √ √ Γ— Yes
softplus Γ— Γ— Γ— No
softsign Γ— Γ— Γ— No
split √ √ √ Yes
squeeze √ √ √ Yes
tanh √ √ Γ— Yes
transpose √ √ √ Yes

Note:

On my opinion, there isn't any other blocker except ULP tolerances which we're working on.

I plan to add first wave operations tests into WPT project firstly, then add tests for others operations which are under implementing on WebNN-Polyfill and WebNN-Baseline. Any suggestion, thanks.

anssiko commented 2 years ago

@BruceDai thank you for this update, your plan sounds good to me. Your wpt contributions play an important role in the CR readiness. Please bring any further blockers to the attention of the WG so we can help you address them in a timely manner.

anssiko commented 2 years ago

@BruceDai I'll make this a meta issue for WPT tests tracking and rename the issue to reflect that.

Please link the relevant issues and PRs into this meta issue to keep the WG informed of the progress (not everyone is watching the huge wpt repo). We'll review your test plan https://github.com/webmachinelearning/webnn/issues/265#issuecomment-1246622380 on our upcoming call. Thank you!

wchao1115 commented 2 years ago

@BruceDai We're close to producing an initial list of recommended ULP tolerance for the ops you're listing here. There will be some more explanation as to why we would recommend a certain tolerance value for certain ops in the list.

+= @fdwr.

fdwr commented 2 years ago

Hi BruceDai, here's the initial list...

Op Old Proposed ULP Tolerance float16 float32 notes
batchNormalization 5 6 ULP 6 ULP (a - mean) * scale / sqrt(variance + epsilon) + bias
clamp 0 0 0 if a > high then high elif a < low then low else a
concat 0 0 0
conv2d 2 IEPOE*2 ULP IEPOE*2 ULP number of reduced input elements multiplied by filter and summed (a sliding dot product like pooling). So (Filter.Sizes.W Filter.Sizes.H (Input.Sizes.C / GroupCount)) 2. // FilterSize.D too if 3D
add 1 1 ULP 1 ULP
sub 1 1 ULP 1 ULP
mul 1 1 ULP 1 ULP
div 2 2 ULP 2 ULP implementations may instead use x * (1/y), and so 1 for reciprocal and 1 for multiply
max 0 0 0
min 0 0 0
pow 3 2 ULP 32 ULP May expand to expβ‚‘(b * log(a)).
abs 0 0 0
ceil 0 0 0
cos 2 1/512 ATOL or 1 ULP 1/1024 ATOL
div 2 2 ULP 2 ULP
exp 2 1 ULP 32 ULP ULP is typically very small (0 to 2), but negative values can yield larger deltas (e.g. exp(-36.7462921143) yields ULPΒ± 27 on my machine). float16 is actually computed using float32 (so 1 ULP for final roundoff).
floor 0 0 0
log 3 1/1024 ATOL or 2 ULP 1/1024 ATOL or 2 ULP
neg 0 0 0
sin 2 1/512 ATOL or 1 ULP 1/1024 ATOL a little looser than GPU specs
tan 4 1/512 ATOL or 1 ULP 1/1024 ATOL
gemm 1 IEPOE*2+3 ULP IEPOE*2+3 ULP (dot(a[i, …], b[.., j]) * alpha) + (beta * C). If no optional C input and alpha/beta are identity, use matmul tolerance
leakyRelu 1 1 ULP 1 ULP if a >= 0 then a else a * alpha
matmul 1 IEPOE*2 ULP IEPOE*2 ULP dot(a[i, …], b[.., j])
averagepool2d 2 IEPOE+2 ULP IEPOE+2 ULP number of reduced element additions and a final division
maxpool2d 0 0 0
relu 0 0 0 max(a, 0)
reduceMax 0 0 0
reduceMean 0 IEPOE+2 ULP IEPOE+2 ULP number of reduced element additions and a final division
reduceMin 0 0 0
reduceProduct 0 IEPOE ULP IEPOE ULP number of reduced multiplications
reduceSum 0 IEPOE ULP IEPOE ULP number of reduced additions
reshape 0 0 0
sigmoid 2 3 32+2 1 / (1 + expβ‚‘(-a)) float16's exp is done as float32 (leaving a few ULP for roundoff)
slice 0 0 0
softmax 1 IEPOE*3+3 ULP IEPOE*3+3 ULP expβ‚‘(a - reducemax(A, axes)) / reducesum(expβ‚‘(A - reducemax(A, axes)), axis); // equivalent expβ‚‘(a) / sum(expβ‚‘(A))
split 0 0 0
squeeze 0 0 0
tan na 1/512 ATOL or 1 ULP 1/1024 ATOL may expand to sin(radians) / cos(radians)
tanh 2 1/512 ATOL or 1 ULP 1/1024 ATOL
transpose 0 0 0

Let me know if you have any questions. 🧐

(UPDATE: More continued here: https://github.com/webmachinelearning/webnn/issues/338#issuecomment-1419652594)

wchao1115 commented 2 years ago

Big thanks to @fdwr for your contribution. @BruceDai Please note that the proposed tolerances are all relative to an ideal baseline. In our WebML call earlier in the week, I believe we've agreed that the WPT test must be relative to a framework-agnostic reference implementation of WebNN.

I think we'll need a new repo under the webmachinelearning GitHub organization specifically to host the ref implementation for our WPT tests @anssiko and @huningxin Do you have any objection to that? This is something we can help too.

huningxin commented 2 years ago

Thanks much @fdwr , that's a significant contribution!

@wchao1115 , I agreed we should host the reference implementation that generates the ideal baseline results. I think that's the reason we created webnn-baseline repo and implemented the first-wave ops. These ops are implemented in JavaScript double precision calculation and follows the straightforward algorithms, such as conv2d.

BruceDai commented 2 years ago

Thanks much @fdwr and @wchao1115 !

wchao1115 commented 2 years ago

@BruceDai It might be more time-efficient if we would arrange a short 15 minutes presentation at the next WG call to walk through and QA over this topic. @anssiko what do you think?

anssiko commented 2 years ago

@wchao1115 I'll put @fdwr on the agenda for our next 6 Oct call, working title "Recommended tolerances for WPT tests".

BruceDai commented 1 year ago

Thanks @fdwr ! I updated last PR https://github.com/web-platform-tests/wpt/pull/34287 following above precision-metrics suggestions, updated existed data movement ops float32 tests which use ULP metrics, added tanh op float32 tests which uses ATOL metrics and gemm op float32 tests which uses IEPOE metrics, this PR is under reviewing. And others float32 tests for remaining first-wave ops of https://github.com/web-platform-tests/wpt/pull/36202 have been updating test data (float64 inputs + float32 baseline) and precision-metrics.

BruceDai commented 1 year ago

Feng discussed with @fdwr about move test data onto separated JSON files which would make maintain tests easily later. now PR https://github.com/web-platform-tests/wpt/pull/36782 was submitted for reviewing, others tests would add soon.

anssiko commented 1 year ago

@BruceDai thanks for your continued work on WebNN WPT. Can you help answer the following questions:

I'm trying to identify opportunities to broaden our WPT contributor base. I'm aware of participants who are eager to get our remaining CR tasks completed and may be able to help in various capacities.

BruceDai commented 1 year ago

The opened issues cover rest of ops which are unimplemented in WebNN-Baseline, if they're fixed, we could leverage these pure JavaScript implementations to get baseline test data for contributing op tests onto wpt.

  • Any specific open issues you'd like to bring to the next WG meeting for discussion?

Current I have none open about tests, I'm still focusing on refining and adding first wave operations tests onto wpt.

Since I've been refining test JSON files of wpt WebNN tests PRs according to feedbacks, those open PRs of WebNN-Basline are also updating, once finished I would ask @huningxin and @fdwr to help review. And experts and engineers are welcome to join for implantation and reviewing.

I'm trying to identify opportunities to broaden our WPT contributor base. I'm aware of participants who are eager to get our remaining CR tasks completed and may be able to help in various capacities.

Thanks @anssiko. Looking forward more contributors, hope we fix CR tasks ASAP :)

BruceDai commented 1 year ago

@anssiko I updated first top comment, please take a look, thanks.

BTW, may we close this issue and track on #338? Thanks.

anssiko commented 1 year ago

@BruceDai @fdwr, others, with your continued contributions we are able to not just meet but exceed the test coverage expectations for the Candidate Recommendation maturity level. Thanks for your contributions and congratulation on reaching this major wpt milestone! This is pioneering work for wpt due to domain-specific requirements of this API.

I'll close this tracker now and we continue track the remaining work in #338 focusing on two remaining ops.