Closed BruceDai closed 1 year ago
@BruceDai, thanks for your contributions to conformance testing. I added webnn-baseline to today's agenda including discussion on ULP tolerances to unblock your work on this (I'm not expecting presentation, just discussion). The webnn-baseline is identified as a CR requirement, so high priority.
@wchao1115 @huningxin your feedback is welcome in this issue to unblock this proposed work. Since we have a busy agenda today, we may need to defer to GH discussion.
I'm sorry to report status late. According to testing ULP tolerances between actual output by WebNN operations with expected data/baseline by WebNN-Baseline on some different HW devices with WebNN-Native DML backend and OpenVINO backend, we observed there're majority ULP tolerances with normal input data and some large ULP distance with some special input data. Here I want to propose following majority ULP tolerances to WG.
@wchao1115 Please also take a look, and I hope that you would share your pervious operations ULP tolerances of DML, thanks.
Op | Propose ULP Tolerance |
---|---|
batchNormalization | 5 |
clamp | 0 |
concat | 0 |
conv2d | 2 |
add | 1 |
sub | 1 |
mul | 1 |
div | 2 |
max | 0 |
min | 0 |
pow | 3 |
abs | 0 |
ceil | 0 |
cos | 2 |
exp | 2 |
floor | 0 |
log | 3 |
neg | 0 |
sin | 2 |
tan | 4 |
gemm | 1 |
leakyRelu | 1 |
matmul | 1 |
averagepool2d | 2 |
maxpool2d | 0 |
relu | 0 |
reduceMax | 0 |
reduceMean | 0 |
reduceMin | 0 |
reduceProduct | 0 |
reduceSum | 0 |
reshape | 0 |
sigmoid | 2 |
slice | 0 |
softmax | 1 |
split | 0 |
squeeze | 0 |
tanh | 2 |
transpose | 0 |
Iβve firstly submitted a PR https://github.com/web-platform-tests/wpt/pull/34287 of adding tests of 8 operations (clamp / concat / relu / reshape / slice / split / squeeze / transpose ) which have 0ULP distance between actual output with expected data/baseline.
As this is related to wpt which is cr blocker #240, I propose to label this issue with "cr". @anssiko
[Piggy-packing on this issue with a more generic w-p-t question.]
@BruceDai, could you give us an update on where we are in terms of test coverage for WebNN API w-p-t tests?
Our plan is to migrate the mocha tests to wpt/webnn to satisfy CR readiness criteria tracked in https://github.com/webmachinelearning/webnn/issues/240.
Looking at the relevant wpt PRs it looks like the migration is in progress.
Do you foresee other blockers besides ULP tolerances discussed in this issue? Thanks for your contributions to w-p-t!
Hi @anssiko, sorry for late response due to the holidays.
Current WebNN API Spec defines 56 operations, WebNN-Baseline has already implemented 42 first wave ops, and WebNN-Polyfill has implemented mostly of them (50/56 including 42 first wave ops). I'm starting to add operation level tests from above listed 8/42 first wave ops. Here's a implemented tests table, please have a look, thanks.
Operations \ tests | WebNN-Baseline | WebNN-Polyfill | WPT | Note (Is first wave operation?) |
---|---|---|---|---|
batchNormalization | β | β | Γ | Yes |
clamp | β | β | β | Yes |
concat | β | β | β | Yes |
conv2d | β | β | Γ | Yes |
convTranspose2d | β(*) | β | Γ | Yes |
add | β | β | Γ | Yes |
sub | β | β | Γ | Yes |
mul | β | β | Γ | Yes |
div | β | β | Γ | Yes |
max | β | β | Γ | Yes |
min | β | β | Γ | Yes |
pow | β | β | Γ | Yes |
abs | β | β | Γ | Yes |
ceil | β | β | Γ | Yes |
cos | β | β | Γ | Yes |
exp | β | β | Γ | Yes |
floor | β | β | Γ | Yes |
log | β | β | Γ | Yes |
neg | β | β | Γ | Yes |
sin | β | β | Γ | Yes |
tan | β | β | Γ | Yes |
gemm | β | β | Γ | Yes |
gru | β | β | Γ | Yes |
gruCell | β | β | Γ | Yes |
hardSigmoid | Γ | Γ | Γ | No |
hardSwish | Γ | β | Γ | No |
instanceNormalization | Γ | β | Γ | No |
leakyRelu | β | β | Γ | Yes |
matmul | β | β | Γ | Yes |
linear | Γ | Γ | Γ | No |
pad | Γ | β | Γ | No |
averagepool2d | β | β | Γ | Yes |
maxpool2d | β | β | Γ | Yes |
l2Pool2d | Γ | β | Γ | No |
reduceL1 | Γ | β | Γ | No |
reduceL2 | Γ | β | Γ | No |
reduceLogSum | Γ | Γ | Γ | No |
reduceLogSumExp | Γ | β | Γ | No |
reduceMax | β | β | Γ | Yes |
reduceMean | β | β | Γ | Yes |
reduceMin | β | β | Γ | Yes |
reduceProduct | β | β | Γ | Yes |
reduceSum | β | β | Γ | Yes |
reduceSumSquare | Γ | Γ | Γ | No |
relu | β | β | β | Yes |
resample2d | Γ | β | Γ | No |
reshape | β | β | β | Yes |
sigmoid | β | β | Γ | Yes |
slice | β | β | β | Yes |
softmax | β | β | Γ | Yes |
softplus | Γ | Γ | Γ | No |
softsign | Γ | Γ | Γ | No |
split | β | β | β | Yes |
squeeze | β | β | β | Yes |
tanh | β | β | Γ | Yes |
transpose | β | β | β | Yes |
Note:
β
in Column WPT
means that we've already added this operation tests into WPT with submitted PRΓ
in Column WPT
& Yes
in last Column means that we locally migrated this first wave op tests from WebNN-Polyfill to WPT WebNN tests on pending to submit by ULP toleranceconvTranspose2d
was split from conv2d
, so WebNN-Baseline can implicitly support convTranspose2d
by invoking conv2d
with some options, I'll submit a PR of adding convTranspose2d
implementation and updating relevant tests to enable WebNN-Baseline clearly support convTranspose2d
On my opinion, there isn't any other blocker except ULP tolerances which we're working on.
I plan to add first wave operations tests into WPT project firstly, then add tests for others operations which are under implementing on WebNN-Polyfill and WebNN-Baseline. Any suggestion, thanks.
@BruceDai thank you for this update, your plan sounds good to me. Your wpt contributions play an important role in the CR readiness. Please bring any further blockers to the attention of the WG so we can help you address them in a timely manner.
@BruceDai I'll make this a meta issue for WPT tests tracking and rename the issue to reflect that.
Please link the relevant issues and PRs into this meta issue to keep the WG informed of the progress (not everyone is watching the huge wpt repo). We'll review your test plan https://github.com/webmachinelearning/webnn/issues/265#issuecomment-1246622380 on our upcoming call. Thank you!
@BruceDai We're close to producing an initial list of recommended ULP tolerance for the ops you're listing here. There will be some more explanation as to why we would recommend a certain tolerance value for certain ops in the list.
+= @fdwr.
Hi BruceDai, here's the initial list...
linear
operator, you can still randomly generate the input
, scale
, and bias
parameters, but ensure scale and bias have consistent signs (both positive or both negative, or else subtraction of nearly equal numbers will eventually bite you in some random permutation). For tangent, avoid querying too close to the repeating asymptotes of 1/4Ο and 3/4Ο.IEPOE
" below) whether it's along a reduction axis like reduceSum and gemm, or a sliding window like conv and averagePool, and so the upper limit for error depends on the parameters, not just a single hard-coded tolerance value. Beware you might witness a very low error running some of these operators and think the precision of the underlying computation is very good, but this is a lie, a false comfort due to round-to-nearest-evens' wonderful tendency to balance out error. You could sum 100 random numbers and get an actual value only a few ULP off from the expected value in the common case, but then you will eventually encounter some outliers that are pretty far off, because the error variance is still wider, and the worst case is broader (broader than say summing 10 numbers). Expectedly, the number of lossy math operations also contributes, not just the number inputs, and the values below are not as tight as they could be in practice, but it's about setting a reasonable upper limit.ATOL
) and others which vary proportional to the magnitude of the signal bounded by a percentage/relative tolerance (RTOL
) of the expected value. Similarly, graphing the error of software math functions will in some cases show error centered some range around the expected value (like with sine and cos which are often implemented via lookup tables with linear interpolation) and in other cases show error proportional to the magnitude of the input (like with convolution and multiplication). In computers, rather than use relative percentages (RTOL), we can instead use the bitwise delta between values to measure the unit's last place (ULP
, which you are already familiar with). For ATOL, it's just actual <= expected + atol && actual >= expected - atol
.Op | Old Proposed ULP Tolerance | float16 | float32 | notes |
---|---|---|---|---|
batchNormalization | 5 | 6 ULP | 6 ULP | (a - mean) * scale / sqrt(variance + epsilon) + bias |
clamp | 0 | 0 | 0 | if a > high then high elif a < low then low else a |
concat | 0 | 0 | 0 | |
conv2d | 2 | IEPOE*2 ULP | IEPOE*2 ULP | number of reduced input elements multiplied by filter and summed (a sliding dot product like pooling). So (Filter.Sizes.W Filter.Sizes.H (Input.Sizes.C / GroupCount)) 2. // FilterSize.D too if 3D |
add | 1 | 1 ULP | 1 ULP | |
sub | 1 | 1 ULP | 1 ULP | |
mul | 1 | 1 ULP | 1 ULP | |
div | 2 | 2 ULP | 2 ULP | implementations may instead use x * (1/y), and so 1 for reciprocal and 1 for multiply |
max | 0 | 0 | 0 | |
min | 0 | 0 | 0 | |
pow | 3 | 2 ULP | 32 ULP | May expand to expβ(b * log(a)) . |
abs | 0 | 0 | 0 | |
ceil | 0 | 0 | 0 | |
cos | 2 | 1/512 ATOL or 1 ULP | 1/1024 ATOL | |
div | 2 | 2 ULP | 2 ULP | |
exp | 2 | 1 ULP | 32 ULP | ULP is typically very small (0 to 2), but negative values can yield larger deltas (e.g. exp(-36.7462921143) yields ULPΒ± 27 on my machine). float16 is actually computed using float32 (so 1 ULP for final roundoff). |
floor | 0 | 0 | 0 | |
log | 3 | 1/1024 ATOL or 2 ULP | 1/1024 ATOL or 2 ULP | |
neg | 0 | 0 | 0 | |
sin | 2 | 1/512 ATOL or 1 ULP | 1/1024 ATOL | a little looser than GPU specs |
tan | 4 | 1/512 ATOL or 1 ULP | 1/1024 ATOL | |
gemm | 1 | IEPOE*2+3 ULP | IEPOE*2+3 ULP | (dot(a[i, β¦], b[.., j]) * alpha) + (beta * C) . If no optional C input and alpha/beta are identity, use matmul tolerance |
leakyRelu | 1 | 1 ULP | 1 ULP | if a >= 0 then a else a * alpha |
matmul | 1 | IEPOE*2 ULP | IEPOE*2 ULP | dot(a[i, β¦], b[.., j]) |
averagepool2d | 2 | IEPOE+2 ULP | IEPOE+2 ULP | number of reduced element additions and a final division |
maxpool2d | 0 | 0 | 0 | |
relu | 0 | 0 | 0 | max(a, 0) |
reduceMax | 0 | 0 | 0 | |
reduceMean | 0 | IEPOE+2 ULP | IEPOE+2 ULP | number of reduced element additions and a final division |
reduceMin | 0 | 0 | 0 | |
reduceProduct | 0 | IEPOE ULP | IEPOE ULP | number of reduced multiplications |
reduceSum | 0 | IEPOE ULP | IEPOE ULP | number of reduced additions |
reshape | 0 | 0 | 0 | |
sigmoid | 2 | 3 | 32+2 | 1 / (1 + expβ(-a)) float16's exp is done as float32 (leaving a few ULP for roundoff) |
slice | 0 | 0 | 0 | |
softmax | 1 | IEPOE*3+3 ULP | IEPOE*3+3 ULP | expβ(a - reducemax(A, axes)) / reducesum(expβ(A - reducemax(A, axes)), axis); // equivalent expβ(a) / sum(expβ(A)) |
split | 0 | 0 | 0 | |
squeeze | 0 | 0 | 0 | |
tan | na | 1/512 ATOL or 1 ULP | 1/1024 ATOL | may expand to sin(radians) / cos(radians) |
tanh | 2 | 1/512 ATOL or 1 ULP | 1/1024 ATOL | |
transpose | 0 | 0 | 0 |
expected
within [actual - atol, actual + atol]
)expected
within `[actual (1-RTOL), actual * (1+RTOL)]`)expected.asRawBits
within [actual.asRawBits - ulp, actual.asRawBits + ulp]
)Let me know if you have any questions. π§
(UPDATE: More continued here: https://github.com/webmachinelearning/webnn/issues/338#issuecomment-1419652594)
Big thanks to @fdwr for your contribution. @BruceDai Please note that the proposed tolerances are all relative to an ideal baseline. In our WebML call earlier in the week, I believe we've agreed that the WPT test must be relative to a framework-agnostic reference implementation of WebNN.
I think we'll need a new repo under the webmachinelearning GitHub organization specifically to host the ref implementation for our WPT tests @anssiko and @huningxin Do you have any objection to that? This is something we can help too.
Thanks much @fdwr , that's a significant contribution!
@wchao1115 , I agreed we should host the reference implementation that generates the ideal baseline results. I think that's the reason we created webnn-baseline repo and implemented the first-wave ops. These ops are implemented in JavaScript double precision calculation and follows the straightforward algorithms, such as conv2d.
Thanks much @fdwr and @wchao1115 !
IEPOE
"? It would be much helpful for me to implement "IEPOE
" in JavaScript for WPT tests if there's an algorithm of that .ATOL
of float32 and float16? You mentioned RTOL
, while actual <= expected + atol && actual >= expected - atol
missed RTOL
, should it be actual <= rtol * expected + atol && actual >= rtol * expected - atol
? then what's the concrete value for RTOL
if using RTOL
?exp
op, I had some observations on #288, it seems that the ULP tolerance being a fixed value doesn't apply to exp
op @fdwr PTAL, thanks.batchNormalization
/ conv2d
/ convTranspose2d
) having fused activation option, what's the ULP tolerance for these ops if they used fused activation option? For example, conv2d
fusing sigmoid
activation, what's ULP tolerance for this float32 case, should it still follow IEPOE*2 ULP
tolerance of conv2d
or follow 3 ULP
of sigmoid
? @BruceDai It might be more time-efficient if we would arrange a short 15 minutes presentation at the next WG call to walk through and QA over this topic. @anssiko what do you think?
@wchao1115 I'll put @fdwr on the agenda for our next 6 Oct call, working title "Recommended tolerances for WPT tests".
Thanks @fdwr !
I updated last PR https://github.com/web-platform-tests/wpt/pull/34287 following above precision-metrics suggestions, updated existed data movement ops float32 tests which use ULP
metrics, added tanh op float32 tests which uses ATOL
metrics and gemm op float32 tests which uses IEPOE
metrics, this PR is under reviewing.
And others float32 tests for remaining first-wave ops of https://github.com/web-platform-tests/wpt/pull/36202 have been updating test data (float64 inputs + float32 baseline) and precision-metrics.
Feng discussed with @fdwr about move test data onto separated JSON files which would make maintain tests easily later. now PR https://github.com/web-platform-tests/wpt/pull/36782 was submitted for reviewing, others tests would add soon.
@BruceDai thanks for your continued work on WebNN WPT. Can you help answer the following questions:
I'm trying to identify opportunities to broaden our WPT contributor base. I'm aware of participants who are eager to get our remaining CR tasks completed and may be able to help in various capacities.
- What is the estimated test coverage (roughly) once we have addressed the open issues documented in https://github.com/webmachinelearning/webnn-baseline/issues
The opened issues cover rest of ops which are unimplemented in WebNN-Baseline, if they're fixed, we could leverage these pure JavaScript implementations to get baseline test data for contributing op tests onto wpt.
- Any specific open issues you'd like to bring to the next WG meeting for discussion?
Current I have none open about tests, I'm still focusing on refining and adding first wave operations tests onto wpt.
- Any open PRs https://github.com/webmachinelearning/webnn-baseline/pulls that'd require special attention from the WG participants other than @huningxin @fdwr who are already looped in.
Since I've been refining test JSON files of wpt WebNN tests PRs according to feedbacks, those open PRs of WebNN-Basline are also updating, once finished I would ask @huningxin and @fdwr to help review. And experts and engineers are welcome to join for implantation and reviewing.
I'm trying to identify opportunities to broaden our WPT contributor base. I'm aware of participants who are eager to get our remaining CR tasks completed and may be able to help in various capacities.
Thanks @anssiko. Looking forward more contributors, hope we fix CR tasks ASAP :)
@anssiko I updated first top comment, please take a look, thanks.
BTW, may we close this issue and track on #338? Thanks.
@BruceDai @fdwr, others, with your continued contributions we are able to not just meet but exceed the test coverage expectations for the Candidate Recommendation maturity level. Thanks for your contributions and congratulation on reaching this major wpt milestone! This is pioneering work for wpt due to domain-specific requirements of this API.
I'll close this tracker now and we continue track the remaining work in #338 focusing on two remaining ops.
Thanks @fdwr's great efforts for reviewing and @Honry's approvals and helps, our WPT WebNN tests PRs have been all landed after previous blocker of syncing updated WebNN IDL interfaces on https://github.com/web-platform-tests/wpt/pull/36908 being resolved by my fixing CI failure PR.
Now there're such 432 WPT WebNN operations tests covered total 40 ops for first wave models after convTranspose2d tests https://github.com/web-platform-tests/wpt/pull/38100 being landed. We could run these tests on https://wpt.live/webnn/, eg:
Bruce is ongoing to add tests of remaining ops #338 closely co-working with @fdwr.
WPT WebNN Tests:
1. WebNN API IDL TestsοΌ
2. WebNN API JavaScript Tests (testharness.js) for operations tests: