JustinMeans commented 1 year ago

Out of sheer curiosity I tried to export bigcode/starcoder to CoreML and got the following error after downloading the weights: "gpt_bigcode is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'gpt2', 'gpt_neo', 'levit', 'm2m_100', 'marian', 'mobilebert', 'mobilevit', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos']

I understand GPTBigCode is an optimized GPT2 Model with support for Multi-Query Attention. https://huggingface.co/docs/transformers/model_doc/gpt_bigcode

Python isn't my strong suit but I just wanted to flag this here. Would running Starcoder on CoreML even be feasible or is it too large?

JustinMeans commented 1 year ago

I attempted to patch features.py by adding the following object (just copied the same spec as GPT2), and got pretty far through the conversion process which ran for around an hour. ` "gpt_bigcode": supported_features_mapping( "feature-extraction",

"feature-extraction-with-past",

        "text-generation",
        #"text-generation-with-past",
        "text-classification",
        "token-classification",
        coreml_config_cls="models.gpt2.GPT2CoreMLConfig",
    ),

` However, toward the end of the conversion, the script failed with the following error:

Some weights of the model checkpoint at bigcode/starcoder were not used when initializing GPTBigCodeModel: ['lm_head.weight']
- This IS expected if you are initializing GPTBigCodeModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTBigCodeModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 1.12.0
Overriding 1 configuration item(s)
    - use_cache -> False
/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:573: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if batch_size <= 0:
Skipping token_type_ids input
Patching PyTorch conversion 'full' with <function GPT2CoreMLConfig.patch_pytorch_ops.<locals>._fill at 0x7fd7ea3915e0>
Converting PyTorch Frontend ==> MIL Ops:   2%|█▏                                                                           | 32/1976 [00:00<00:13, 139.17 ops/s]
Traceback (most recent call last):
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/justinmeans/Documents/JMLLC/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/justinmeans/Documents/JMLLC/exporters/src/exporters/coreml/__main__.py", line 166, in main
    convert_model(
  File "/Users/justinmeans/Documents/JMLLC/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
    mlmodel = export(
  File "/Users/justinmeans/Documents/JMLLC/exporters/src/exporters/coreml/convert.py", line 680, in export
    return export_pytorch(preprocessor, model, config, quantize, compute_units)
  File "/Users/justinmeans/Documents/JMLLC/exporters/src/exporters/coreml/convert.py", line 553, in export_pytorch
    mlmodel = ct.convert(
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/_converters_entry.py", line 492, in convert
    mlmodel = mil_convert(
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 285, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
    return load(*args, **kwargs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 63, in load
    return _perform_torch_convert(converter, debug)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 102, in _perform_torch_convert
    prog = converter.convert()
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 284, in convert
    convert_nodes(self.context, self.graph)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 88, in convert_nodes
    add_op(context, node)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4130, in masked_fill
    res = mb.select(cond=mask, a=value, b=x, name=node.name)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 182, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/mil/builder.py", line 166, in _add_op
    new_op = op_cls(**kwargs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/mil/operation.py", line 187, in __init__
    self._validate_and_set_inputs(input_kv)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/mil/operation.py", line 496, in _validate_and_set_inputs
    self.input_spec.validate_inputs(self.name, self.op_type, input_kvs)
  File "/Users/justinmeans/opt/anaconda3/lib/python3.9/site-packages/coremltools/converters/mil/mil/input_type.py", line 137, in validate_inputs
    raise ValueError(msg)
ValueError: In op, of type select, named input.1, the named input `b` must have the same data type as the named input `a`. However, b has dtype int32 whereas a has dtype fp32.

Will keep investigating as perhaps it's a PyTorch / Python version issue.

pcuenca commented 1 year ago

I got the same problem, I don't think it's a versioning issue. Looking into it :)

hollance commented 1 year ago

Isn't this model going to be way too big to fit into a protobuf file (max size 2 GB)?

pcuenca commented 1 year ago

Apparently the 2GB limitation is resolved in macOS somehow, see for example this section: https://github.com/apple/ml-stable-diffusion#-converting-models-to-core-ml

I've tested a couple of large language models and they seem to work on macOS too.

pcuenca commented 1 year ago

Fixed by #45. Note that you currently need transformers @ main and coremltools 7.0b1.

huggingface / exporters

GPTBigCode Support? #34

"feature-extraction-with-past",