coreml model based on transformer for translation purpose

harrylyf commented 3 years ago

Hi,

I am wondering if it is possible for you to make a transformer-based model for translation purposes in Core ML for this repository? I had a trained PyTorch model but failed to convert it to ONNX. Are there any other existing translation model out there that can be easily converted to .mlmodel format?

Thanks, Yufan

julien-c commented 3 years ago

it's been a while since I've last used CoreML but maybe @vincentfpgarcia has more recent info on conversion from PyTorch to CoreML.

Supposedly you now don't have to pivot through ONNX anymore now:

https://coremltools.readme.io/docs/pytorch-conversion

julien-c commented 3 years ago

Starting with coremltools 4.0, you can convert your model trained in PyTorch to the Core ML format directly, without requiring an explicit step to save the PyTorch model in ONNX format. Converting the model directly is recommended.

vincentfpgarcia commented 3 years ago

I haven't updated this page in a while, sorry about that. @julien-c is right, with coremltools 4.0, you can now convert your PyTorch trace directly to CoreML, it's very straightforward! AFAIK, the only restriction is that coremltools 4.0 does not generate models compatible with iOS 12. So, if your app still supports iOS12, I think ONNX is still the way to go. I'll update this page very soon to give you the procedure for coreml 4.0.

harrylyf commented 3 years ago

Thank you for your response! I trained the transformer model based on this and was trying to test it on iOS devices. The model I saved according to the original git repository is named 'trained_param.chkpt,' but when I try to convert it with the following code,

def load_model(model_path, device):

    checkpoint = torch.load(model_path, map_location=device)
    model_opt = checkpoint['settings']

    model = Transformer(
        model_opt.src_vocab_size,
        model_opt.trg_vocab_size,

        model_opt.src_pad_idx,
        model_opt.trg_pad_idx,

        trg_emb_prj_weight_sharing=model_opt.proj_share_weight,
        emb_src_trg_weight_sharing=model_opt.embs_share_weight,
        d_k=model_opt.d_k,
        d_v=model_opt.d_v,
        d_model=model_opt.d_model,
        d_word_vec=model_opt.d_word_vec,
        d_inner=model_opt.d_inner_hid,
        n_layers=model_opt.n_layers,
        n_head=model_opt.n_head,
        dropout=model_opt.dropout).to(device)

    model.load_state_dict(checkpoint['model'])
    print('[Info] Trained model state loaded.')
    return model 

model = load_model('trained_pingan_param.pt', 'cpu')

mlmodel = ct.convert(
    model,
    source='pytorch',
    inputs=[ct.TensorType(name='input_name', shape=(1, 120))]
)

It has the following error message:

TypeError: @model must either be a PyTorch .pt or .pth file or a TorchScript object, received: <class 'transformer.Models.Transformer'>

I tried to rename the model as '.pt' but the same error message remains. Do you have any suggestions?

harrylyf commented 3 years ago

In addition, I also tried with your t5-base model but not sure how to initialize it. What I did so far is:

download the .h5 file from here
tried to load it as mlmodel = ct.convert('tf_model.h5', source="tensorflow") but failed due to ValueError: No model found in config file.
I figured I need to initialize the model first and load_weights from 'tf_model.h5' and then do ct.convert(). I have installed the transformers package but not sure what are the simplest way to get the model and convert it to mlmodel. Am I missing something here? Thanks in advance!

hollance commented 3 years ago

You can't just pass the PyTorch model object to ct.convert, you first need to run torch.jit.trace() on it. See also here: https://coremltools.readme.io/docs/model-tracing

harrylyf commented 3 years ago

ah, I see. Thanks!

huggingface / swift-coreml-transformers

coreml model based on transformer for translation purpose #21