Open aarmstrong78 opened 4 years ago
Actually, I get the same error just trying to run distilbert-onnx-coreml.py unchanged. I had to update the "target_ios" param to "minimum_ios_deployment_target" to get the script to run, so perhaps the script isn't compatible with the current version of coremltools?
Quantizing layer 783
Quantizing layer 798
Quantizing layer 809
Quantizing layer 824
––––––
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-12-0d553d8b37ca> in <module>
36
37 pred_coreml = mlmodel.predict(
---> 38 {"input_ids": input_ids.astype(np.float32)}, useCPUOnly=True
39 )
40
~/anaconda3/lib/python3.7/site-packages/coremltools/models/model.py in predict(self, data, useCPUOnly, **kwargs)
328
329 if self.__proxy__:
--> 330 return self.__proxy__.predict(data, useCPUOnly)
331 else:
332 if _macos_version() < (10, 13):
RuntimeError: {
NSLocalizedDescription = "Error computing NN outputs.";
}
That’s very possible.
What versions of XCode and Mac OS are you running?
Do you get the same error on an iOS device?
MacOS 10.15.2 and Xcode Version 11.3 (11C29).
Running on an Xcode simulator I go the same error on my custom model with a bit more detail: "Cannot squeeze a dimension whose value is not 1", with the relevant dimension being 64. So somehow the shape was wrong when it got to that operation.
In the interim I've moved on to translating the model from a TF version directly to coreml using tfcoreml.
@aarmstrong78 Did you ever solve this problem? I have the same error message (Cannot squeeze a dimension whose value is not 1
) from BERT running in CoreML (via ONNX).
Hi @jbmaxwell , no I didn't. After a lot of effort managed to get the TensorFlow version of Bert to convert into CoreML instead.
Just as an update, the model I was using above was based on the Pytorch-pretrianed-BERT repo, so I tried the huggingface distilBert model, but I get exactly the same error. (It's seeming like the problem is coming from onnx.)
Another update on my Cannot squeeze
error; I'm training BertForMaskedLM, which, looking at the code, shouldn't have any squeeze
calls in it—BertForQuestionAnswering, however, does call squeeze
. So I'm guessing that, for whatever reason, the model being saved (or converted by onnx) is the BertForQuestionAnswering one. Any obvious reason why that might be?
If you can share the mlmodel file with the error, I can take a look.
@hollance That would be extremely helpful, thanks! bert-test-256_FP16.mlmodel.zip
The error for from CoreML (in Xcode) is:
2020-02-27 10:05:54.749888-0800 Spliqs[2968:1145298] [espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "Invalid state": Cannot squeeze a dimension whose value is not 1: shape[1]=256 status=-5
2020-02-27 10:05:54.750039-0800 Spliqs[2968:1145298] [coreml] Error computing NN outputs -5
Error running prediction: Error Domain=com.apple.CoreML Code=0 "Error computing NN outputs." UserInfo={NSLocalizedDescription=Error computing NN outputs.}
Just to note, I just ran that through my training script for a couple of epochs, as a test.
Run this script to fix the issue:
import coremltools
import numpy as np
mlmodel = coremltools.models.MLModel("bert-test-256_FP16.mlmodel")
spec = mlmodel._spec
spec.neuralNetwork.layers[9].activation.linear.alpha = 1 # whereNonZero
spec.neuralNetwork.layers[11].activation.linear.alpha = 1 # squeeze
new_model = coremltools.models.MLModel(spec)
new_model.save("w00t.mlmodel")
new_model.predict({"input_ids": np.zeros((1, 256), dtype=np.int32)})
The issue was the squeeze and whereNonZero layers at the beginning of the model. This script replaces them with harmless linear activation layers.
Thanks so much, I'll give this a try!
Do you understand why the squeeze was there in the first place? Searching modeling_distilbert.py
only reveals an explicit .squeeze
call in BertForQuestionAnswering (I was only using MLM). Reading your CoreML Survival Techniques, it suggests that occasionally TensorFlow will insert layers that aren't obviously necessary. Do you suspect that's what's happened here?
ps - I naively tried removing the squeeze
entirely, which (obviously enough, I suppose) just caused a shape error at the broadcastTo
... ;-) I'll keep this trick of replacing with benign layers in mind in future.
It probably got inserted in the ONNX conversion step. There are most likely a whole bunch of other layers that don't really need to be in there. :-D
Wow, okay. It really is the Wild West! It would be great if Apple would throw a little more of its multi-billion dollar steam behind this process. It's generally pretty painful getting from PyTorch to CoreML, in my experience.
Thanks again.
To be fair @bhushan23 from onnx-coreml helped us a lot so far :)
Absolutely! ONNX is invaluable, since presumably this wouldn't be possible at all without it. But by taking a proprietary approach, Apple must surely have realized there would be a significant demand on resources/human-hours trying to stay current in such a rapidly evolving field. Anyway, I don't mean to complain, it's just that it can be tricky enough getting a model working, only to face another significant (and sometimes insurmountable) hurtle in trying to get it converted to an mlmodel.
So after testing the model on iOS I noticed that the outputs weren't the same. Trying to debug the problem led me to try onnxruntime
, to verify whether the ONNX model gave the same results as PyTorch. However, onnxruntime
failed with a type error on float
for the Equal
operator. With some help (from ONNX/OnnxRuntime), it was discovered that the models converted from ONNX were indicating Opset v9 (in Netron), which didn't accept floats for Equal
. By manually setting opset_version=11
in the ONNX export call, I was able to get it to verify correctly in onnxruntime
, but now I get an operator not available error for Range
when converting to CoreML. Very strange. The Range
operator appears to be added by Opset 11—I made no other changes to the model itself. Not sure what the path forward might be. I'm on Pytorch 1.4, Cuda 10.1, latest release versions of ONNX, coremltools, and onnx-coreml.
ONNX and Core ML do not have 100% the same features, so it's possible your ONNX model contains an operator that is not supported by Core ML. Sometimes you can work around this by removing the offending operator from the ONNX model by hand.
Okay, understood. Just out of curiosity, I decided to try modifying distilbert-onnx-coreml.py
, from the swift-coreml-transformers
repo, by adding opset_version=11
to the export call and it does add the Range
operator to the model (just as it does in my case). So it seems that something about the model with Opset 11 is being misinterpreted (or interpreted differently) by ONNX.
It seems clear that I'm blocked for now on DistilBert since a working ("working" as in, giving accurate results in onnxruntime
) ONNX conversion requires opset v11 and onnx-coreml isn't yet supporting v11.
So I'm trying to get regular Bert running, for now. I trained the model and onnxruntime
is happy with the output (i.e., verified against PyTorch—ONNX opset v10). The export did require the same squeeze
fix you posted above, but the output in Xcode is still incorrect when compared to PyTorch. Is it possible the "fix" has added some error?
I can start a new issue, if that's better.
I'm not sure if this repo is still being maintained or not since the last commit seems to be several months ago but I did manage to get the distilbert-onnx-coreml.py to convert a finetuned distilbert model using opset10 that seems to give decent results. It basically involves iterating through the model and manually modifying the squeeze layers to use the axis with dimension 1. I can open a pull request to modify the script if that's something people would want. Just figured i'd throw this up here in case anyone has had this same issue because I didn't find any answer to this when googling around.
Yeah, I've been noticing the silence of the onnx/conversion-related repos recently, which is pretty easily explained by the new PyTorch support in CoreML 4. That's also kind of incomplete, in my (limited) experience—e.g., no leaky_relu conversion—but very promising. (And obviously it's very new!) I'd be interested in distilbert, for sure, but I think waiting a little longer on CoreML 4 is worthwhile.
@calderma We are not proactive in maintaining this repo, but if you have a fix for current version of CoreML or related tools, by all means, please submit it.
Hello I decided i'm just going to post an example code snippet and people can modify it to their needs because i think it's not a fix that would be uniform for every use case but people should be able to modify it to their needs. Using this after converting the model via pytorch->onnx->coreml in the distilbert-onnx-coreml.py script i was able to get it to run on device and get similar results to the pytorch model. If it doesn't work for you post in this issue and i can try to help.
#The motivation behind this is to iterate through a fine-tuned distilbert model
#and fix the squeeze layers which for some reason try to squeeze along the
#incorrect dimension. This will throw an error similar to
#Espresso exception: "Invalid state": Cannot squeeze a dimension whose value is not 1
#the coreml pytorch conversion does not work out of the box either from my experience
#so this is a way to get a fine-tuned distilbert QA model to run on an iOS device.
#note that you should run the torch.onnx.export command with the opset_version flag
#set to less than 11. I tested it and it works on opset=9 and opset=10
import coremltools
mlmodel = coremltools.models.MLModel($YOUR_FINETUNED_MODEL_HERE)
spec = mlmodel._spec
layers_to_change = []
#iterates through the network layers and identifies the squeeze layers
for i,layer in enumerate(spec.neuralNetwork.layers):
if "Squeeze" in layer.name:
layers_to_change.append(i)
#changes the axes to squeeze along the 0 axis which should be 1 dimensional
#in the converted model
for x in layers_to_change:
del spec.neuralNetwork.layers[x].squeeze.axes[:]
spec.neuralNetwork.layers[x].squeeze.axes.extend([0])
new_model = coremltools.models.MLModel(spec)
new_model.save($YOUR_MODEL_PATCHED)
I am also facing the same issue with MobileDet Model
2020-09-16 11:56:12.168355+0530 ObjectDetection-CoreML[9517:1345027] [coreml] Failure in -executePlan:error:.
2020-09-16 11:56:12.168554+0530 ObjectDetection-CoreML[9517:1345027] Finalizing CVPixelBuffer 0x28255c0a0 while lock count is 1.
2020-09-16 11:56:12.218048+0530 ObjectDetection-CoreML[9517:1345027] [espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "Invalid state": scatter_nd_kernel: In TF_SCATTER mode, invalid shape of UPDATES tensor. status=-5
Was anyone able to find the solution for this. Thanks in advance.
Hi,
I used distilbert-onnx-coreml.py to convert a custom PyTorch BertForSequenceClassification model to CoreML. The conversion finishes without error.
However I can't use the resulting CoreML model for prediction. The following code fails:
Note, my input dim is 64:
When I try to substitute my model into the DistilBERT demo app, I get the following error in Xcode when predicting:
The only hint that something might have gone wrong in the onnx->coreml conversion is a note about a deleted node, however I'm struggling to find out whether this is just a red herring:
Are there any particular layers that need custom conversion in BERT into coreml? Any suggestions on further debugging?
Thanks.