Segmentation fault while converting Bert-base-uncased with README command

MaxFrax commented 1 month ago

System Info

transformers.js 7f5081da29c3f77ee830269ab801344776e61bcb
Operating System Mac OS Sonoma 14.5
Macbook M1
Python 3.11.5
transformers==4.33.2
onnxruntime==1.15.1
optimum==1.13.2
onnx==1.13.1

Environment/Platform

[ ] Website/web-app
[X] Browser extension
[ ] Server-side (e.g., Node.js, Deno, Bun)
[ ] Desktop app (e.g., Electron)
[ ] Other (e.g., VSCode extension)

Description

When I try to run the conversion script with the command python -m scripts.convert --quantize --model_id bert-base-uncased I get a segmentation fault.

(default) ➜  transformers.js-main python -m scripts.convert --quantize --model_id bert-base-uncased
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
/Users/maxfrax/opt/anaconda3/envs/default/lib/python3.11/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
Framework not specified. Using pt to export to ONNX.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Automatic task detection to fill-mask (possible synonyms are: masked-lm).
[1]    6178 segmentation fault  python -m scripts.convert --quantize --model_id bert-base-uncased

Do any of you have any pointer to pinpoint the root cause?

Reproduction

Download the repository as a zip
Extract the folder and move into it from the terminal
Run the README command python -m scripts.convert --quantize --model_id bert-base-uncased

xenova commented 1 month ago

Hi there 👋 Can you try in an environment like Google Colab? This might be an issue with your system/environment. Also, if you only want to use that model, you can use my pre-converted one here: https://huggingface.co/Xenova/bert-base-uncased

MaxFrax commented 1 month ago

@xenova Thank you for sharing your pre-converted model. I was attempting the conversion myself to verify my environment.

I tested the conversion script on Azure using a Standard_NC4as_T4_v3 instance (4 cores, 28 GB RAM, 176 GB disk, T4 GPU), and it worked perfectly.

I should probably set up a fresh environment on my MacBook M1 to provide easy-to-reproduce steps, but given its age, it might not be worth the effort.

xenova / transformers.js