Open BruceDai opened 2 years ago
@huningxin PTAL, thanks.
This is great input for this week's WG discussion on conformance testing. Thanks much @BruceDai .
/cc @anssiko @wchao1115 @dontcallmedom
@BruceDai Can you please explain what do you mean by "max ULP distance" and how do you plan to use it?
Are we looking to use the result from the WebNN-native on CPU as our baseline?
Are we looking to use the result from the WebNN-native on CPU as our baseline?
Bruce is using the result of WebNN-polyfill CPU backend as the baseline. WebNN-poyfill CPU is based on TF.js CPU backend which uses JavaScript numbers to calculate kernels. I suppose the results should have double precision. /cc @pyu10055
Regarding to current WebNN-native CPU backends, say OpenVINO CPU, XNNPACK and oneDNN, I understand they are single precision and might not meet the baseline requirement.
@wchao1115 The max ULP distance means the max one of those various distance results between actual output and baseline.
Here's a sample of pow
op tested on WebGL backend.
# use random data as input
input = [0.33435354, 0.57139647, 0.03689031]
exponent =30,
// use result by CPU backend (tfjs-backend-cpu based) as baseline
baseline = [
5.323259448666113e-15,
5.106538125687621e-8,
1.0229478789571165e-43]
// actual output = pow(input, 30)
actual output =[
5.323248437237799e-15,
5.1065363493307814e-8,
0.0]
ULP distance between actual output with baseline:
ULP distance between 5.323248437237799e-15 and 5.323259448666113e-15 is 26
ULP distance between 5.1065363493307814e-8 and 5.106538125687621e-8 is 5
ULP distance between 0.0 and 1.0229478789571165e-43 is 73
Among of these three ULP distance numbers (26, 5, 73) , current max ULP distance is 73.
There's a problem that max ULP distance would update with
What's the strategy for defining acceptable ULP distance?
@BruceDai Your ULP values seem high. For reference, the pow
operator in DirectML GPU conformance test has a ULP tolerance of 0 (exact) for single-precision compare, 2 for half-precision (float16) via single-precision compute compare, and 4 for half-precision via half-precision compute compare.
@huningxin Are you sure that the baseline result here is from a pure double-precision compute on the CPU? If the baseline is indeed from a double precision result, then you will need to truncate it down to a single-precision value before comparing it with the single-precision result from the WebGL backend. The 2 inputs to the CompareUlp
function must be of the same type.
I'll be happy to add a DirectML
column with our ULP values to your table above if it helps. Note that you'll need at least 2 tables, one for float32 and another for float16 results. DirectML further differentiates float16 result into two modes -- the result on a float16 tensor from a float32 calculation, and the result on a float16 tensor from a float16 calculation. (We actually break the latter category further down to a float16 multiplication with float16 accumulation, vs. float16 multiplication with float32 accumulation, but let's leave that detail for now)
Are you sure that the baseline result here is from a pure double-precision compute on the CPU?
I believe so, because AFAIK, the JavaScript performs double-precision arithmetic calculations. And tfjs-backend-cpu kernels' calculation is implemented in JavaScript.
If the baseline is indeed from a double precision result, then you will need to truncate it down to a single-precision value before comparing it with the single-precision result from the WebGL backend.
I suppose this is also true, because the double-precision results are stored back to a Float32Array
before comparing with other single-precision result, e.g., from WebGL backend.
We probably could compute the baseline by JavaScript along with the test cases (part of the w-p-t).
For Bruce's pow
example, the baseline could be computed simply by Math.pow
. The code sketch could be:
const input = [0.33435354, 0.57139647, 0.03689031];
const exponent =30;
// Compute the double-precision baseline
const baseline = input.map(x => Math.pow(x, exponent));
// baseline = [5.323261130422279e-15, 5.1065382759817323e-8, 1.0171128528373136e-43]
// Truncate the double-precision baseline to single-precision
const baselineInFloat32 = new Float32Array(baseline)
// baselineInFloat32 = [5.323261142732008e-15, 5.106538125687621e-8, 1.0229478789571165e-43]
// Then do ULP comparison with results of WebNN pow
This is an extremely simplified example. The baseline of other complex ops would require more efforts to implement the compute kernel. As a reference, the tf.js conv2d JS kernel is ~150 LOC. The efforts might be deserved, because this could help us establish a baseline that would meet the requirements raised in WebML WG Teleconference – 2 Dec 2021, like
by @wchao1115
all computation is done in double precision we don't want any intermediate casting an open source ref, anyone can look at the code and be confident
by @dontcallmedom
the codebase should be easy to review, not too many layers of abstraction
Any thoughts?
this would match what I have in mind, indeed.
FWIW, while maintaining it in WPT is an option, I think we don't need make this a requirement - at the end of the day, what is needed in WPT is only the results of the computation, not the computation code itself.
In particular, given the amount of the WPT-specific infrastructure in that repo, we might be better served by a lighterweight dedicated repo to build and audit the baseline.
I developed experimental double precision baseline implementation of element binary options referring to https://github.com/tensorflow/tfjs codes.
Here're result shortcuts of running float32 binary tests using WebNN-native backend (DML-GPU and OpenVINO-CPU) under criteria of 2
ULP distance for each element binary options .
There're 22 binary tests, 16 Pass and 6 Fail on DML backend, 17 Pass and 5 Fail on DML backend
Observed that the output behaviors of pow
op are not same on these two backends.
figure-1 testing by DirectML backend (GPU)
figure-2 testing by OpenVINO backend (CPU)
@wchao1115 @huningxin @dontcallmedom PTAL, thanks.
Since CPU use Double Precision, I collected max ULP distance for WebNN ops using result of CPU backend (tfjs-backend-cpu based) as the baseline with pr #139 - ULP comparison on three devices.
Here're some observations:
relu(negative number)
on Device 1 & 2 + WebGL backend is-0.0
, distance is2147483648
to the baseline0.0
Open:
relu
operation with negative input, some devices compute out-0.0
while expected is0.0
on CPU backend, the ULP distance between -0.0 and baseline 0.0 is 2147483648, while with non-negative input the max distance is 0, how to decide acceptable ULP distance forrelu
op?Distance details: