xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
11.21k stars 697 forks source link

[Feature request] Add support for external data file (.onnx_data) #105

Open felladrin opened 1 year ago

felladrin commented 1 year ago

Not really an bug with Transformers.js, but with the conversion script.

Got an error when trying to convert lmsys/fastchat-t5-3b-v1.0 with text2text-generation-with-past task. Using task text2text-generation works fine, though.

Am I missing something?

And is there a way to run the model without it being created with -with-past? Currently when I run const pipe = await pipeline("text2text-generation", "lmsys/fastchat-t5-3b-v1.0"); it triggers

Error: File not found. Could not locate "models/lmsys/fastchat-t5-3b-v1.0/seq2seq-lm-with-past/encoder_model.onnx".

Files inside models/lmsys/fastchat-t5-3b-v1.0/seq2seq-lm-with-past/ are the following:

added_tokens.json
config.json
generation_config.json
model.onnx
model.onnx_data
special_tokens_map.json
spiece.model
tokenizer.json
tokenizer_config.json

How to reproduce

Run:

python -m scripts.convert --model_id lmsys/fastchat-t5-3b-v1.0 --from_hub --quantize --task text2text-generation-with-past

Expect an output like this:

Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
        - use_cache -> True
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `input_ids`.
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `decoder_input_ids`.
/opt/homebrew/lib/python3.10/site-packages/transformers/modeling_utils.py:828: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_mask.shape[1] < attention_mask.shape[1]:
/opt/homebrew/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:507: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  elif past_key_value.shape[2] != key_value_states.shape[1]:
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
================ Diagnostic Run torch.onnx.export version 2.0.0 ================
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Saving external data to one file...
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `input_ids`.
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `decoder_input_ids`.
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/user/Repositories/transformers.js/scripts/convert.py", line 310, in <module>
    main()
  File "/Users/user/Repositories/transformers.js/scripts/convert.py", line 282, in main
    _, onnx_outputs = export_models(
  File "/opt/homebrew/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 609, in export_models
    export(
  File "/opt/homebrew/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 722, in export
    config.fix_dynamic_axes(output, device=device, input_shapes=input_shapes, dtype=dtype)
  File "/opt/homebrew/lib/python3.10/site-packages/optimum/exporters/onnx/base.py", line 285, in fix_dynamic_axes
    outputs = session.run(None, onnx_inputs)
  File "/opt/homebrew/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid Feed Input Name:past_key_values.9.encoder.value

Expected behavior

Was expecting it to work, the same way python -m scripts.convert --model_id lmsys/fastchat-t5-3b-v1.0 --from_hub --quantize --task text2text-generation worked.

Environment

xenova commented 1 year ago

I think this is due to a recent update of optimum - which will be fixed in the next release. One problem I am aware of, however, is that trying to convert large models (e.g., 3b param models) will produce an external data file, which is currently not supported by Transformers.js.

Are you able to export with optimum directly (which is what our conversion script uses behind the scenes)? Your command should look something like:

optimum-cli export onnx -m lmsys/fastchat-t5-3b-v1.0 output
felladrin commented 1 year ago

Ah, nice! Using that command made it generate some more files. But the command exited with the following exception:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/opt/homebrew/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 163, in main
    service.run()
  File "/opt/homebrew/lib/python3.10/site-packages/optimum/commands/export/onnx.py", line 219, in run
    main_export(
  File "/opt/homebrew/lib/python3.10/site-packages/optimum/exporters/onnx/__main__.py", line 366, in main_export
    raise Exception(
Exception: An error occured during validation, but the model was saved nonetheless at output. Detailed error: [ONNXRuntimeError] : 1 : FAIL : Load model from output/decoder_model_merged.onnx failed:/Users/runner/work/1/s/onnxruntime/core/graph/model.cc:146 onnxruntime::Model::Model(onnx::ModelProto &&, const onnxruntime::PathString &, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList *, const logging::Logger &, const onnxruntime::ModelOptions &) Unsupported model IR version: 9, max supported IR version: 8

List of files generated (including the encoder_model.onnx, which was not being generated by the conversion script):

added_tokens.json
config.json
decoder_model.onnx
decoder_model.onnx_data
decoder_model_merged.onnx
decoder_model_merged.onnx_data
decoder_with_past_model.onnx
decoder_with_past_model.onnx_data
encoder_model.onnx
encoder_model.onnx_data
generation_config.json
special_tokens_map.json
spiece.model
tokenizer.json
tokenizer_config.json
xenova commented 1 year ago

Thanks 👍 The first error message you received looks like a bug with optimum. If you'd like, you can raise an issue on their repo.

For now (even if you did get conversion working), the model is currently just slightly too large to run with the current version of Transformers.js (which doesn't support the external format: .onnx_data). I will hopefully get around to adding support for it in the coming week or so, but I am just prioritizing other things first.

felladrin commented 1 year ago

Thanks for your work on this lib!

The pipeline didn't work for lmsys/fastchat-t5-3b-v1.0, although all the files were there. When running transformers.js in Node, and it's exiting with the following message:

Error: no available backend found. ERR: 
Error: Failed to load model with error: /Users/runner/work/1/s/onnxruntime/core/graph/model.cc:146 onnxruntime::Model::Model(onnx::ModelProto &&, const onnxruntime::PathString &, const onnxruntime::IOnnxRuntimeOpSchemaRegistryList *, const logging::Logger &, const onnxruntime::ModelOptions &) Unsupported model IR version: 9, max supported IR version: 8

But I'm happy it's close to getting it working! The other T5 model (LaMini-Flan-T5-783M) is working great, and it's already pretty good for its purpose.

xenova commented 1 year ago

Right, it's failing because the .onnx file does not contain all the model parameters. Those parameters are stored in .onnx_data.

This is due to a limitation with protobuf, which has a 2GB limit. See here for more information.

This also means optimum's validation code doesn't load the external data format (which also appears to be a bug).

However, I do intend to add support for the external data format :) Just got a lot on my plate right now haha.

xenova commented 1 year ago

Updated title to be a feature request for the external data file format (which is used for models larger than 2GB).

NawarA commented 7 months ago

@xenova Does transformers.js support models larger than 2 GB yet?

nemphys commented 5 months ago

Any news on this one? I am trying to load bge-m3 (full, not quantized, dtype='fp32') using the v3 branch (onnxruntime-1.17.x) in nodejs and I am getting the following error:

Error: Deserialize tensor embeddings.word_embeddings.weight failed.GetFileLength for ./model.onnx_data failed:Invalid fd was supplied: -1

(which is weird, since the model.onnx_data file is there).

JohnReginaldShutler commented 4 months ago

Pinging for any updates on this too! I am trying to load Xenova/TinyLlama-1.1B-Chat-v1.0 using the v3 branch (onnxruntime-1.17.x) and there is a decoder_model_merged.onnx_data file (tried renaming it to model.onnx_data to no avail) but I still get this.

Error: ERROR_CODE: 1, ERROR_MESSAGE: Deserialize tensor onnx::MatMul_7210 failed.Failed to load external data file "./model.onnx_data", error: Module.MountedFiles is not available.