Fix grouped convolutions with 3x3 kernels

Followup to https://github.com/webonnx/wonnx/pull/124, fixes the remaining bug.

The bug was caused by the 3x3 convolution optimization triggering for grouped convolutions, which adds padding to the kernel in preparation for conv_kernel_3x3.wgsl, but grouped convolutions aren't using that shader, so the default conv shader ended up using the wrong kernel data.

Also includes a fix for the out-of-bounds access check (testing that a u32 is < 0 never fails, it has to be done with an i32). This wasn't causing any issues as far as I can tell though.

webonnx / wonnx

Fix grouped convolutions with 3x3 kernels #157