webonnx / wonnx

A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web
Other
1.61k stars 59 forks source link

Implement `Shape` operator #130

Closed LoipesMas closed 1 year ago

LoipesMas commented 1 year ago

Hello! I gave Shape operator a try. I pretty much got it working, however there are some small issues:

  1. tests (run with env OP_TESTED=shape pytest tests/test_specific_op.py) almost pass, but they fail because of different types: it expects int64 while our output is float32. Not sure how to fix that (output array is i32 and I even tried setting scalar_type to I64, but this doesn't help)
  2. shader is actually const, since everything is determined at shader compile time (at template render to be precise). this might be an issue (e.g. for models with dynamic inference size). but I don't know how this could be solved.
  3. minor thing: because of the point above, we don't actually use input_0 in the shader code. but if we don't use it, then it gets removed and then bindings don't check out. I solved this by binding it to unused variable, but there could be a prettier way.

P.S. I'll cleanup the code once the other issues are solved ;]

haixuanTao commented 1 year ago

Thanks for contributing to wonnx!

I want to spend more time on it, but life is pretty crazy at the moment.

  1. Yeah, typing in something other than f32 is still pretty experimental.
  2. Yeah, I genuinely don't know if we will ever have true dynamic sizes. I think that the best we could do is pad input to a fixed length. So having the shader as a const is fine, in my opinion.
  3. I see. Ultimately, I guess that in the case of a static inference with no dynamic shape. This computation step might be removed by some IR compilers. I think that your solution of binding variables to unused variables is fine for now.
LoipesMas commented 1 year ago

I genuinely don't know if we will ever have true dynamic sizes

That's a bummer. But, if I understand correctly, we can have "dynamic sizes" between compilations, right? So if we had a video in one resolution, we would compile a session for that resolution and for another video with different resolution we would compile again and have different sizes. And resolution doesn't change between frames, so it should work, correct?

pixelspark commented 1 year ago

I genuinely don't know if we will ever have true dynamic sizes

That's a bummer. But, if I understand correctly, we can have "dynamic sizes" between compilations, right? So if we had a video in one resolution, we would compile a session for that resolution and for another video with different resolution we would compile again and have different sizes. And resolution doesn't change between frames, so it should work, correct?

Well, yes, if you implement replacing dynamic dimensions with static ones in a model and then infer remaining dimension info (basically re-implement parts of onnxsimplifier). It should not be too hard and could be a useful addition even when only partially implemented.

pixelspark commented 1 year ago

@LoipesMas you may be very happy with the work done here: https://github.com/webonnx/wonnx/pull/141 :-)

pixelspark commented 1 year ago

So the Shape operator was already constant-folded during shape inference (in wonnx-preprocessing) but as of https://github.com/webonnx/wonnx/pull/151/commits/4427198885fef04a65b5f9c16638b61d62435c6d (see #151) it is also implemented at runtime (the optimizer will replace it with a static initializer). Will close this PR after I have added the necessary tests to #151.

pixelspark commented 1 year ago

Implemented as of https://github.com/webonnx/wonnx/commit/62be58bff31216ce4f65a503a1d04182ae6dfa48