Open FL33TW00D opened 4 months ago
@philpax how would you get ground truth in the browser? any good ideas?
Unpopular opinion: Have tests in Python, use https://github.com/microsoft/playwright-python to call the JS/WASM code and get results, compare to Pytorch
This gives us 7 combinations we need to fuzz all functionality on.
You may also need to consider AMD/NVIDIA/Intel graphics cards for Windows/Linux, x86 vs Apple Silicon for macOS, and mobile support. Yeah, this gets to be pretty painful pretty quickly :sob:
@philpax how would you get ground truth in the browser? any good ideas?
Hmm... yeah, I think you'd want to capture ground truth data with PyTorch on the "host" and then check against that. It'll be pretty annoying because of the sheer amount of data, but you could generate that on the fly or just compare the outputs.
Unpopular opinion: Have tests in Python, use https://github.com/microsoft/playwright-python to call the JS/WASM code and get results, compare to Pytorch
This also sounds pretty reasonable to me. You could also do the same thing from Rust, but it might be easier to drive them from Python because you could use PyTorch directly. (I think you're already doing some kind of PyTorch orchestration from Rust for your existing tests, though?)
Proposing a new testing suite that will allow for operation tests to be run in the browser and ensure valid results across the following DOF:
E.g Add, MacOS, Intel, 1e-3, Q8_0
Invoke: TestGen::generate_unary(op, tol, dt)
Result:
"Add": {
"inputs": [{
"value": [0.1, 0.2, 0.3],
"dt": "Q8"
}],
"outputs": [{
"value": [0.2, 0.3, 0.4],
"dt": "Q8"
}],
"atol": 1e-3,
"rtol": 1e-3,
}
#[cfg_attr(target_arch="wasm32", wasm_bindgen_test]
pub fn test_add() {
let test_case: WebTest = serde::deserialize(include_bytes!("add.json"));
...
}
As more and more browsers ship WebGPU, there may be minor discrepancies between implementations. This may cause us significant delays and issues if not addressed.
So, what we need is a test suite like no other. It must fuzz all functionality in all possible deployment settings.
Browser: Chrome, Safari, Firefox, OS: Windows, Macos, Linux
This gives us 7 combinations we need to fuzz all functionality on.
We do not currently do operation tests in the browser because they rely on
pytorch
for ground truth - this must be resolved by using pre-generated ground truth data (or some other great idea).This will be done in conjunction with our property based testing, which runs locally and is ground truthed against pytorch.