distilbert-onnx-coreml.py "works" for BERT, but I get "Error computing NN outputs." when predicting

aarmstrong78 commented 4 years ago

Hi,

I used distilbert-onnx-coreml.py to convert a custom PyTorch BertForSequenceClassification model to CoreML. The conversion finishes without error.

However I can't use the resulting CoreML model for prediction. The following code fails:

model = coremltools.models.MLModel(f"./path/to/model/model.mlmodel")

input_ids = np.zeros((1,64))
d = {}
d['input_ids'] = input_ids

predictions = model.predict(d, True)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-29-1c38a7b07949> in <module>
----> 1 predictions = model.predict(d, True)

~/anaconda3/lib/python3.7/site-packages/coremltools/models/model.py in predict(self, data, useCPUOnly, **kwargs)
    328 
    329         if self.__proxy__:
--> 330             return self.__proxy__.predict(data, useCPUOnly)
    331         else:
    332             if _macos_version() < (10, 13):

RuntimeError: {
    NSLocalizedDescription = "Error computing NN outputs.";
}

Note, my input dim is 64:

spec.description.input

[name: "input_ids"
type {
  multiArrayType {
    shape: 1
    shape: 64
    dataType: INT32
  }
}
]

When I try to substitute my model into the DistilBERT demo app, I get the following error in Xcode when predicting:

CoreMLBert.bert_transactions_64Input
2020-01-07 10:12:58.271435+1300 CoreMLBert[1044:35882] [espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "Invalid state": Cannot squeeze a dimension whose value is not 1: shape[1]=64 status=-5
2020-01-07 10:12:58.272716+1300 CoreMLBert[1044:35882] [coreml] Error computing NN outputs -5

The only hint that something might have gone wrong in the onnx->coreml conversion is a note about a deleted node, however I'm struggling to find out whether this is just a red herring:

[Core ML Pass] 1 disconnected constants nodes deleted
Translation to CoreML spec completed. Now compiling the CoreML model.
Model Compilation done.

Are there any particular layers that need custom conversion in BERT into coreml? Any suggestions on further debugging?

Thanks.

aarmstrong78 commented 4 years ago

Actually, I get the same error just trying to run distilbert-onnx-coreml.py unchanged. I had to update the "target_ios" param to "minimum_ios_deployment_target" to get the script to run, so perhaps the script isn't compatible with the current version of coremltools?

Quantizing layer 783
Quantizing layer 798
Quantizing layer 809
Quantizing layer 824
––––––

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-0d553d8b37ca> in <module>
     36 
     37 pred_coreml = mlmodel.predict(
---> 38     {"input_ids": input_ids.astype(np.float32)}, useCPUOnly=True
     39 )
     40 

~/anaconda3/lib/python3.7/site-packages/coremltools/models/model.py in predict(self, data, useCPUOnly, **kwargs)
    328 
    329         if self.__proxy__:
--> 330             return self.__proxy__.predict(data, useCPUOnly)
    331         else:
    332             if _macos_version() < (10, 13):

RuntimeError: {
    NSLocalizedDescription = "Error computing NN outputs.";
}

julien-c commented 4 years ago

That’s very possible.

What versions of XCode and Mac OS are you running?

Do you get the same error on an iOS device?

aarmstrong78 commented 4 years ago

MacOS 10.15.2 and Xcode Version 11.3 (11C29).

Running on an Xcode simulator I go the same error on my custom model with a bit more detail: "Cannot squeeze a dimension whose value is not 1", with the relevant dimension being 64. So somehow the shape was wrong when it got to that operation.

In the interim I've moved on to translating the model from a TF version directly to coreml using tfcoreml.

jbmaxwell commented 4 years ago

@aarmstrong78 Did you ever solve this problem? I have the same error message (Cannot squeeze a dimension whose value is not 1) from BERT running in CoreML (via ONNX).

aarmstrong78 commented 4 years ago

Hi @jbmaxwell , no I didn't. After a lot of effort managed to get the TensorFlow version of Bert to convert into CoreML instead.

jbmaxwell commented 4 years ago

Just as an update, the model I was using above was based on the Pytorch-pretrianed-BERT repo, so I tried the huggingface distilBert model, but I get exactly the same error. (It's seeming like the problem is coming from onnx.)

jbmaxwell commented 4 years ago

Another update on my Cannot squeeze error; I'm training BertForMaskedLM, which, looking at the code, shouldn't have any squeeze calls in it—BertForQuestionAnswering, however, does call squeeze. So I'm guessing that, for whatever reason, the model being saved (or converted by onnx) is the BertForQuestionAnswering one. Any obvious reason why that might be?

hollance commented 4 years ago

If you can share the mlmodel file with the error, I can take a look.

jbmaxwell commented 4 years ago

@hollance That would be extremely helpful, thanks! bert-test-256_FP16.mlmodel.zip

The error for from CoreML (in Xcode) is:

2020-02-27 10:05:54.749888-0800 Spliqs[2968:1145298] [espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "Invalid state": Cannot squeeze a dimension whose value is not 1: shape[1]=256 status=-5
2020-02-27 10:05:54.750039-0800 Spliqs[2968:1145298] [coreml] Error computing NN outputs -5
Error running prediction: Error Domain=com.apple.CoreML Code=0 "Error computing NN outputs." UserInfo={NSLocalizedDescription=Error computing NN outputs.}

Just to note, I just ran that through my training script for a couple of epochs, as a test.

hollance commented 4 years ago

Run this script to fix the issue:

import coremltools
import numpy as np

mlmodel = coremltools.models.MLModel("bert-test-256_FP16.mlmodel")
spec = mlmodel._spec

spec.neuralNetwork.layers[9].activation.linear.alpha = 1  # whereNonZero
spec.neuralNetwork.layers[11].activation.linear.alpha = 1   # squeeze

new_model = coremltools.models.MLModel(spec)
new_model.save("w00t.mlmodel")

new_model.predict({"input_ids": np.zeros((1, 256), dtype=np.int32)})

The issue was the squeeze and whereNonZero layers at the beginning of the model. This script replaces them with harmless linear activation layers.

jbmaxwell commented 4 years ago

Thanks so much, I'll give this a try!

Do you understand why the squeeze was there in the first place? Searching modeling_distilbert.py only reveals an explicit .squeeze call in BertForQuestionAnswering (I was only using MLM). Reading your CoreML Survival Techniques, it suggests that occasionally TensorFlow will insert layers that aren't obviously necessary. Do you suspect that's what's happened here?

ps - I naively tried removing the squeeze entirely, which (obviously enough, I suppose) just caused a shape error at the broadcastTo ... ;-) I'll keep this trick of replacing with benign layers in mind in future.

hollance commented 4 years ago

It probably got inserted in the ONNX conversion step. There are most likely a whole bunch of other layers that don't really need to be in there. :-D

jbmaxwell commented 4 years ago

Wow, okay. It really is the Wild West! It would be great if Apple would throw a little more of its multi-billion dollar steam behind this process. It's generally pretty painful getting from PyTorch to CoreML, in my experience.

Thanks again.

julien-c commented 4 years ago

To be fair @bhushan23 from onnx-coreml helped us a lot so far :)

jbmaxwell commented 4 years ago

Absolutely! ONNX is invaluable, since presumably this wouldn't be possible at all without it. But by taking a proprietary approach, Apple must surely have realized there would be a significant demand on resources/human-hours trying to stay current in such a rapidly evolving field. Anyway, I don't mean to complain, it's just that it can be tricky enough getting a model working, only to face another significant (and sometimes insurmountable) hurtle in trying to get it converted to an mlmodel.

jbmaxwell commented 4 years ago

So after testing the model on iOS I noticed that the outputs weren't the same. Trying to debug the problem led me to try onnxruntime, to verify whether the ONNX model gave the same results as PyTorch. However, onnxruntime failed with a type error on float for the Equal operator. With some help (from ONNX/OnnxRuntime), it was discovered that the models converted from ONNX were indicating Opset v9 (in Netron), which didn't accept floats for Equal. By manually setting opset_version=11 in the ONNX export call, I was able to get it to verify correctly in onnxruntime, but now I get an operator not available error for Range when converting to CoreML. Very strange. The Range operator appears to be added by Opset 11—I made no other changes to the model itself. Not sure what the path forward might be. I'm on Pytorch 1.4, Cuda 10.1, latest release versions of ONNX, coremltools, and onnx-coreml.

hollance commented 4 years ago

ONNX and Core ML do not have 100% the same features, so it's possible your ONNX model contains an operator that is not supported by Core ML. Sometimes you can work around this by removing the offending operator from the ONNX model by hand.

jbmaxwell commented 4 years ago

Okay, understood. Just out of curiosity, I decided to try modifying distilbert-onnx-coreml.py, from the swift-coreml-transformers repo, by adding opset_version=11 to the export call and it does add the Range operator to the model (just as it does in my case). So it seems that something about the model with Opset 11 is being misinterpreted (or interpreted differently) by ONNX.

jbmaxwell commented 4 years ago

It seems clear that I'm blocked for now on DistilBert since a working ("working" as in, giving accurate results in onnxruntime) ONNX conversion requires opset v11 and onnx-coreml isn't yet supporting v11.

So I'm trying to get regular Bert running, for now. I trained the model and onnxruntime is happy with the output (i.e., verified against PyTorch—ONNX opset v10). The export did require the same squeeze fix you posted above, but the output in Xcode is still incorrect when compared to PyTorch. Is it possible the "fix" has added some error?

I can start a new issue, if that's better.

calderma commented 4 years ago

I'm not sure if this repo is still being maintained or not since the last commit seems to be several months ago but I did manage to get the distilbert-onnx-coreml.py to convert a finetuned distilbert model using opset10 that seems to give decent results. It basically involves iterating through the model and manually modifying the squeeze layers to use the axis with dimension 1. I can open a pull request to modify the script if that's something people would want. Just figured i'd throw this up here in case anyone has had this same issue because I didn't find any answer to this when googling around.

jbmaxwell commented 4 years ago

Yeah, I've been noticing the silence of the onnx/conversion-related repos recently, which is pretty easily explained by the new PyTorch support in CoreML 4. That's also kind of incomplete, in my (limited) experience—e.g., no leaky_relu conversion—but very promising. (And obviously it's very new!) I'd be interested in distilbert, for sure, but I think waiting a little longer on CoreML 4 is worthwhile.

julien-c commented 4 years ago

@calderma We are not proactive in maintaining this repo, but if you have a fix for current version of CoreML or related tools, by all means, please submit it.

calderma commented 4 years ago

Hello I decided i'm just going to post an example code snippet and people can modify it to their needs because i think it's not a fix that would be uniform for every use case but people should be able to modify it to their needs. Using this after converting the model via pytorch->onnx->coreml in the distilbert-onnx-coreml.py script i was able to get it to run on device and get similar results to the pytorch model. If it doesn't work for you post in this issue and i can try to help.


#The motivation behind this is to iterate through a fine-tuned distilbert model
#and fix the squeeze layers which for some reason try to squeeze along the
#incorrect dimension. This will throw an error similar to 
#Espresso exception: "Invalid state": Cannot squeeze a dimension whose value is not 1
#the coreml pytorch conversion does not work out of the box either from my experience
#so this is a way to get a fine-tuned distilbert QA model to run on an iOS device.
#note that you should run the torch.onnx.export command with the opset_version flag
#set to less than 11. I tested it and it works on opset=9 and opset=10

import coremltools

mlmodel = coremltools.models.MLModel($YOUR_FINETUNED_MODEL_HERE)

spec = mlmodel._spec

layers_to_change = []

#iterates through the network layers and identifies the squeeze layers
for i,layer in enumerate(spec.neuralNetwork.layers):
        if "Squeeze" in layer.name:
                layers_to_change.append(i)

#changes the axes to squeeze along the 0 axis which should be 1 dimensional
#in the converted model
for x in layers_to_change:
        del spec.neuralNetwork.layers[x].squeeze.axes[:]
        spec.neuralNetwork.layers[x].squeeze.axes.extend([0])

new_model = coremltools.models.MLModel(spec)

new_model.save($YOUR_MODEL_PATCHED)

akanksharma commented 4 years ago

I am also facing the same issue with MobileDet Model

2020-09-16 11:56:12.168355+0530 ObjectDetection-CoreML[9517:1345027] [coreml] Failure in -executePlan:error:.
2020-09-16 11:56:12.168554+0530 ObjectDetection-CoreML[9517:1345027] Finalizing CVPixelBuffer 0x28255c0a0 while lock count is 1.
2020-09-16 11:56:12.218048+0530 ObjectDetection-CoreML[9517:1345027] [espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "Invalid state": scatter_nd_kernel: In TF_SCATTER mode, invalid shape of UPDATES tensor. status=-5

Was anyone able to find the solution for this. Thanks in advance.

huggingface / swift-coreml-transformers

distilbert-onnx-coreml.py "works" for BERT, but I get "Error computing NN outputs." when predicting #16