Error while converting ONNX export of DistilBERT

julien-c commented 4 years ago

I'm trying to convert an ONNX export of DistilBERT to CoreML, using the following code:

from pytorch_transformers.modeling_distilbert import DistilBertForQuestionAnswering

model = DistilBertForQuestionAnswering.from_pretrained(
    "distilbert-base-uncased-distilled-squad", torchscript=True
)
model.eval()

torch.onnx.export(
    model,
    torch.ones(1, 128, dtype=torch.long),
    "distilbert-squad-128.onnx",
    verbose=True,
    input_names=["input_ids"],
    output_names=["start_scores", "end_scores"],
)

mlmodel = convert(model="distilbert-squad-128.onnx", target_ios="13")

I've hand-converted this model to CoreML before (as well as GPT-2, in this repo), but for ease of use and scalability to future models I would like to use onnx-coreml in a more seamless way.

I'm encountering a few different roadblocks:

NotImplementedError: Unsupported ONNX ops of type: Where on torch.masked_fill_. Would this operation be supported at some point? In the meantime, I can probably work around this by changing my PyTorch code to another equivalent construct.
I can't seem to convert a nn.Softmax(dim=-1) layer. Replacing it with nn.functional.softmax seems to work.
In some cases, the conversion for torch.erf in gelu fails. TODO: Check why and see why torch.nn.functional.gelu is not converted to ONNX.

Here's the ONNX file: https://s3.amazonaws.com/models.huggingface.co/bert/distilbert-squad-128.onnx

Help would be super appreciated!

aseemw commented 4 years ago

Most likely these are bugs in the converter since the operators you mentioned, where, masked_fill, softmaxND and gelu with 3 different modes are all supported in the CoreML spec.

what difference does it make to the onnx grpah, when nn.Softmax(dim=-1) is replaced by nn.functional.softmax?

bhushan23 commented 4 years ago

@julien-c We have added support for Where op with #487 regarding SoftMax layer, we are using Old Softmax layer by default due to overflow issue. Old softmax layer is rank dependent which is blocked on ONNX shape inference.

But, if we use custom_conversion_function to use new softmax layer, model is converting with good SNR and PSNR score

Start Scores: SNR 104.39693320792118, PSNR 86.34187544571695
End Scores: SNR 103.61101916379764, PSNR 86.12284546374713

Please use following script to convert model

from pytorch_transformers.modeling_distilbert import DistilBertForQuestionAnswering
from onnx_coreml import convert
import torch
import numpy as np

model = DistilBertForQuestionAnswering.from_pretrained(
    "distilbert-base-uncased-distilled-squad", torchscript=True
)
# torch.save(model, './distilbert.pt')
model.eval()

torch.onnx.export(
    model,
    torch.ones(1, 128, dtype=torch.long),
    "distilbert-squad-128.onnx",
    verbose=True,
    input_names=["input_ids"],
    output_names=["start_scores", "end_scores"],
)

def _convert_softmax(builder, node, graph, err):
    '''
    convert to CoreML SoftMax ND Layer:
    https://github.com/apple/coremltools/blob/655b3be5cc0d42c3c4fa49f0f0e4a93a26b3e492/mlmodel/format/NeuralNetwork.proto#3547
    '''
    axis = node.attrs.get('axis', 1)
    builder.add_softmax_nd(
        name=node.name,
        input_name=node.inputs[0],
        output_name=node.outputs[0] + ('_softmax' if node.op_type == 'LogSoftmax' else ''),
        axis=axis
    )
    if node.op_type == 'LogSoftmax':
        builder.add_unary(
            name=node.name+'_log',
            input_name=node.outputs[0]+'_softmax',
            output_name=node.outputs[0],
            mode='log'
        )

mlmodel = convert(model="./distilbert-squad-128.onnx", target_ios="13",
                  custom_conversion_functions={'Softmax':_convert_softmax})
mlmodel.save('./converted.mlmodel')

Validated model accuracy as follows:


import onnx
import onnxruntime as rt
import coremltools
import torch
import numpy as np

def _compute_SNR(x,y):
  noise = x - y
  noise_var = np.sum(noise ** 2)/len(noise) + 1e-7
  signal_energy = np.sum(y ** 2)/len(y)
  max_signal_energy = np.amax(y ** 2)
  SNR = 10 * np.log10(signal_energy/noise_var)
  PSNR = 10 * np.log10(max_signal_energy/noise_var)   
  return SNR, PSNR    

spec = coremltools.utils.load_spec('./converted.mlmodel')
mlmodel = coremltools.models.MLModel(spec, useCPUOnly=True)

input = np.random.randint(0, high=1000, size=(1, 128))
input_dict = {'input_ids': input.astype(np.float32)}

pred_coreml = mlmodel.predict(input_dict, useCPUOnly=True)

model = torch.load('distilbert.pt')
pred_pt = model(torch.from_numpy(input).type(torch.LongTensor))

pt_out = {}
pt_out['start_scores'] = pred_pt[0].detach().numpy()
pt_out['end_scores'] = pred_pt[1].detach().numpy()

snr, psnr = _compute_SNR(pred_coreml['start_scores'], pt_out['start_scores'])
print('Start Scores: SNR {}, PSNR {}'.format(snr, psnr))
snr, psnr = _compute_SNR(pred_coreml['end_scores'], pt_out['end_scores'])
print('End Scores: SNR {}, PSNR {}'.format(snr, psnr))

@julien-c could you please give a try to above script with tot (from source onnx-coreml)?

julien-c commented 4 years ago

Thank you @bhushan23, it works! I've pushed the models to our repo (with credits) and tweeted a link to the release: https://twitter.com/julien_c/status/1181615276439330816

We'll integrate the model inside our demo Squad app later today.

julien-c commented 4 years ago

Just updated our demo app to use onnx-coreml converted DistilBERT 🎉

Inference on device is ~35% faster ⚡️

Merged PR is: https://github.com/huggingface/swift-coreml-transformers/pull/13

bhushan23 commented 4 years ago

Awesome!! @julien-c could you please give more details? Faster than tf-lite?

julien-c commented 4 years ago

@bhushan23 DistilBERT is ~35% faster than (full) BERT on device (while keeping 97% of the accuracy on Squad)

onnx / onnx-coreml

Error while converting ONNX export of DistilBERT #478