huggingface / ratchet

A cross-platform browser ML framework.
https://ratchet.sh
MIT License
627 stars 33 forks source link

Operation testing suite #235

Open FL33TW00D opened 4 months ago

FL33TW00D commented 4 months ago

As more and more browsers ship WebGPU, there may be minor discrepancies between implementations. This may cause us significant delays and issues if not addressed.

So, what we need is a test suite like no other. It must fuzz all functionality in all possible deployment settings.

Browser: Chrome, Safari, Firefox, OS: Windows, Macos, Linux

This gives us 7 combinations we need to fuzz all functionality on.

We do not currently do operation tests in the browser because they rely on pytorch for ground truth - this must be resolved by using pre-generated ground truth data (or some other great idea).

This will be done in conjunction with our property based testing, which runs locally and is ground truthed against pytorch.

FL33TW00D commented 4 months ago

@philpax how would you get ground truth in the browser? any good ideas?

sigma-andex commented 4 months ago

Unpopular opinion: Have tests in Python, use https://github.com/microsoft/playwright-python to call the JS/WASM code and get results, compare to Pytorch

philpax commented 4 months ago

This gives us 7 combinations we need to fuzz all functionality on.

You may also need to consider AMD/NVIDIA/Intel graphics cards for Windows/Linux, x86 vs Apple Silicon for macOS, and mobile support. Yeah, this gets to be pretty painful pretty quickly :sob:

@philpax how would you get ground truth in the browser? any good ideas?

Hmm... yeah, I think you'd want to capture ground truth data with PyTorch on the "host" and then check against that. It'll be pretty annoying because of the sheer amount of data, but you could generate that on the fly or just compare the outputs.

Unpopular opinion: Have tests in Python, use https://github.com/microsoft/playwright-python to call the JS/WASM code and get results, compare to Pytorch

This also sounds pretty reasonable to me. You could also do the same thing from Rust, but it might be easier to drive them from Python because you could use PyTorch directly. (I think you're already doing some kind of PyTorch orchestration from Rust for your existing tests, though?)

FL33TW00D commented 4 months ago

Proposal

Proposing a new testing suite that will allow for operation tests to be run in the browser and ensure valid results across the following DOF:

  1. Operation
  2. OS
  3. GPU Vendor
  4. Tolerance
  5. DType

E.g Add, MacOS, Intel, 1e-3, Q8_0

Invoke: TestGen::generate_unary(op, tol, dt) Result:

"Add": {
     "inputs": [{
             "value": [0.1, 0.2, 0.3],
             "dt": "Q8"
      }],
      "outputs": [{
              "value": [0.2, 0.3, 0.4],
              "dt": "Q8"
      }],
      "atol": 1e-3,
      "rtol": 1e-3,
}

#[cfg_attr(target_arch="wasm32", wasm_bindgen_test]
pub fn test_add() {
         let test_case: WebTest = serde::deserialize(include_bytes!("add.json"));
         ...
}