Example of distilbert-base-uncased fails validation

I ran the first command provided (to do some sanity checking of my setup since I usually get very high output errors for larger models like LLMs) and I get an output validation error.

I've made sure my python environment was clean by running pip freeze > uninstall.txt && pip uninstall -r uninstall.txt, followed by a pip install -e . && pip install torch==2.2.0, however I'm getting the validation error at the end:

python -m exporters.coreml --model=distilbert-base-uncased exported/distilbert-main.mlpackage
Using framework PyTorch: 2.2.0
/Users/ryan/miniconda3/envs/exporters/lib/python3.11/site-packages/transformers/modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
/Users/ryan/miniconda3/envs/exporters/lib/python3.11/site-packages/transformers/models/distilbert/modeling_distilbert.py:230: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask, torch.tensor(torch.finfo(scores.dtype).min)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                         | 0/285 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████▊| 284/285 [00:00<00:00, 9783.28 ops/s]
Running MIL frontend_pytorch pipeline: 100%|█████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 474.24 passes/s]
Running MIL default pipeline:   0%|                                                                                  | 0/78 [00:00<?, ? passes/s]/Users/ryan/miniconda3/envs/exporters/lib/python3.11/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:266: UserWarning: Output, '466', of the source model, has been renamed to 'var_466' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  50%|████████████████████████████████████                                    | 39/78 [00:00<00:00, 386.60 passes/s]/Users/ryan/miniconda3/envs/exporters/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
  return input_var.val.astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|█████████████████████████████████████████████████████████████████████████| 78/78 [00:01<00:00, 66.33 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 613.88 passes/s]
Validating Core ML model...
    -[✓] Core ML model output names match reference model ({'last_hidden_state'})
    - Validating Core ML model output "last_hidden_state":
        -[✓] (1, 128, 768) matches (1, 128, 768)
        -[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/ryan/git/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/ryan/git/exporters/src/exporters/coreml/__main__.py", line 166, in main
    convert_model(
  File "/Users/ryan/git/exporters/src/exporters/coreml/__main__.py", line 70, in convert_model
    validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
  File "/Users/ryan/git/exporters/src/exporters/coreml/validate.py", line 220, in validate_model_outputs
    raise ValueError(
ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: 0.013694286346435547

Is this expected? Can anyone else confirm they get the same result from the first provided model? Wondering if the model no longer works out of the box, or if there is something wrong with my setup.

I have an M3 Max Macbook Pro with 36GB of RAM. I can run the XCode performance tab on the generated model which could be a sign that the conversion worked but, not sure about that high amount of error/difference.

huggingface / exporters

Example of distilbert-base-uncased fails validation #82