Closed jallum closed 2 years ago
Oh this is a bug because AxonOnnx does not handle cases where the input configuration uses a channels: :last
configuration! This means we will have to update all pooling operations, conv operations, and normalization to handle channels last. Let me do some research to see how we can get this information from ONNX or infer it from the input. Also, go birds!
i'd be happy to put in some time to get this working if you don't mind explaining what you have in mind. (i applied to the slack channel, but am waiting for approval)
so, I re-exported my model using a flag that is supposed to rearrange the inputs, like so:
python -m tf2onnx.convert --saved-model ./imagenet_mobilenet_v2_100_224_feature_vector_5 --opset 15 --output imagenet_mobilenet_v2_100_224_feature_vector_5.onnx --inputs-as-nchw inputs
..and i changed the image loading code in my livebook around, like so:
{:ok, image} =
cards
|> Enum.random()
|> Pixels.read_file()
input =
image.data
|> Nx.from_binary({:u, 8})
|> Nx.reshape({image.height, image.width, 4}, names: [:width, :height, :channels]) # RGBA
|> Nx.slice([0, 0, 0], [224, 224, 3]) # Discard the alpha channel
|> Nx.transpose(axes: [:channels, :height, :width]) # <--- added this
|> Nx.divide(255.0)
...rerun, and i get the exact same error -- the problem might be something else?
upon further investigation, it looks like deserialization for "Conv" isn't dealing with depth-wise convolutions properly. thoughts?
Ahhh I did not realize those were supposed to be depthwise convolutions --- so what you can try is if the ONNX groups
option is not equal to 1, you can make it an Axon.depthwise_conv
...though I'm still a little surprised the original conv is failing because it is setting feature_group_size
.
yep. depthwise conv is used quite a bit on this kind of model. anyway, i made the simplistic change to using Axon.depthwise_conv when group > 1, and after seven minutes(!) of runtime, no crash -- i get an answer. I'm wondering if any of these other parameters need to be tweaked? anyway, there's a PR up for this and we can move the discussion there.
I'll compare the answer I get with a reference in python and see if it's ballpark-ish correct. if so, then it's just a matter of making it run faster -- should be subsecond, not minutes for this model.
@jallum Are you specfying compiler: EXLA
in Axon.predict
? Additionally, there was a recent update to the Axon compiler which should speed up inference times significantly, try to delete AxonOnnx's mix.lock
and then re-run mix deps.get
and see if inference is indeed faster
no, i was not -- now that i have, it's subsecond, as expected. (i was already on the latest axon.) woo!
Fixed on master.
While attempting to run predict() on a image, I encountered the following ArgumentError:
Here's how the model (imagenet_mobilenet_v2_100_224_feature_vector_5.zip) was decoded:
The model seems to decode properly: