huggingface / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
11.39k stars 709 forks source link

The scripts/convert.py script fails for a few reasons #850

Open lancejpollard opened 3 months ago

lancejpollard commented 3 months ago

System Info

Environment/Platform

Description

I am trying to run the model locally, as it doesn't appear to work running a remote model in Node.js.

First I followed https://github.com/xenova/transformers.js/blob/main/scripts/convert.py (which is linked in the README):

$ python3 -m pip install -r requirements.txt
Collecting transformers==4.33.2 (from transformers[torch]==4.33.2->-r requirements.txt (line 1))
  Using cached transformers-4.33.2-py3-none-any.whl.metadata (119 kB)
ERROR: Could not find a version that satisfies the requirement onnxruntime<1.16.0 (from versions: 1.17.0, 1.17.1, 1.17.3, 1.18.0, 1.18.1)
ERROR: No matching distribution found for onnxruntime<1.16.0

So that onnxruntime<1.16.0 does not seem to exist in pip.

Can you update that script?

Second, I tried just installing the latest versions of everything instead, by making this the requirements.txt:

transformers
onnxruntime
optimum
tqdm
onnx

But after I ran this:

$ python3 -m pip install -r requirements.txt
... successful installation stuff...
$ python3 -m convert --quantize --task summarization --model_id bart-large-cnn

I got an error:

TypeError: quantize_dynamic() got an unexpected keyword argument 'optimize_model'

Full stack trace:

Framework not specified. Using pt to export the model.
The task `text2text-generation` was manually specified, and past key values will not be reused in the decoding. if needed, please pass `--task text2text-generation-with-past` to export using the past key values.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
Non-default generation parameters: {'max_length': 142, 'min_length': 56, 'early_stopping': True, 'num_beams': 4, 'length_penalty': 2.0, 'no_repeat_ngram_size': 3, 'forced_bos_token_id': 0, 'forced_eos_token_id': 2}

***** Exporting submodel 1/2: BartEncoder *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
    - use_cache -> False
./venv/lib/python3.12/site-packages/transformers/models/bart/modeling_bart.py:247: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
./venv/lib/python3.12/site-packages/transformers/models/bart/modeling_bart.py:254: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, tgt_len, src_len):
./venv/lib/python3.12/site-packages/transformers/models/bart/modeling_bart.py:286: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):

***** Exporting submodel 2/2: BartForConditionalGeneration *****
Using framework PyTorch: 2.3.1
Overriding 1 configuration item(s)
    - use_cache -> False
./venv/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py:86: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1 or self.sliding_window is not None:
./venv/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py:162: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if past_key_values_length > 0:
Post-processing the exported models...
Weight deduplication check in the ONNX export requires accelerate. Please install accelerate to run it.

Validating ONNX model models/bart-large-cnn/encoder_model.onnx...
    -[✓] ONNX model output names match reference model (last_hidden_state)
    - Validating ONNX Model output "last_hidden_state":
        -[✓] (2, 16, 1024) matches (2, 16, 1024)
        -[✓] all values close (atol: 1e-05)

Validating ONNX model models/bart-large-cnn/decoder_model.onnx...
    -[✓] ONNX model output names match reference model (logits)
    - Validating ONNX Model output "logits":
        -[✓] (2, 16, 50264) matches (2, 16, 50264)
        -[x] values not close enough, max diff: 7.82012939453125e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
- logits: max diff = 7.82012939453125e-05.
 The exported model was saved at: models/bart-large-cnn
Quantizing:   0%|                                                                        | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "./convert.py", line 545, in <module>
    main()
  File "./convert.py", line 521, in main
    quantize([
  File "./convert.py", line 294, in quantize
    quantize_dynamic(
TypeError: quantize_dynamic() got an unexpected keyword argument 'optimize_model'

It only seems to have output these files:

Screenshot 2024-07-16 at 9 53 12 PM

So then when I run my Node.js script (full script code at the bottom of the question in the SO link above), I get:

Error: local_files_only=true or env.allowRemoteModels=false and file was not found locally at "./import/language/tibetan/models/bart-large-cnn/onnx/encoder_model_quantized.onnx".

How do I get this working?

Reproduction

As described above.

mayank1khurana commented 2 months ago

I actually resolved the issue by updating Optimum to the latest version and keeping all other packages in requirements.txt the same.

  1. pip install -r requirements.txt
  2. pip install --upgrade optimum
kaczmarj commented 2 months ago

i also get the error

TypeError: quantize_dynamic() got an unexpected keyword argument 'optimize_model'

the optimize_model argument was removed in https://github.com/microsoft/onnxruntime/pull/16422 (merged june 21 2023).

(i am using onnxruntime version 1.18.1, the current latest version.)

gyagp commented 2 months ago

I just tried v3 branch, and upgraded onnxruntime to 1.18.1. It seems I have no problem with command "python -m scripts.convert --quantize --model_id bert-base-uncased" on Windows.