Closed xenova closed 5 months ago
Hi, I'm also interested in converting musicgen model to onnx format so I can try to deploy it to the device. May i know is it support on Optimum now?
It would be great if this feature is done. Btw, how can I get the transformers.js ?
hi @xenova,
May i know if have a plan or schedule to support Optimum for convert it to ONNX model?
any update?
Hi @kanger45 @MaiZhiHao @zeke-john https://github.com/huggingface/optimum/pull/1779 is merged, which exports Musicgen in several parts to generate audio samples conditioned on a text prompt (Reference: https://huggingface.co/docs/transformers/model_doc/musicgen#text-conditional-generation). This uses the decoder KV cache. The following subcomponents are exported:
text_encoder.onnx
: corresponds to the text encoder part in https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/models/musicgen/modeling_musicgen.py#L1457.encodec_decode.onnx
: corresponds to the Encodec audio encoder part in https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/models/musicgen/modeling_musicgen.py#L2472-L2480.decoder_model.onnx
: The Musicgen decoder, without past key values input, and computing cross attention. Not required at inference (use decoder_model_merged.onnx instead).decoder_with_past_model.onnx
: The Musicgen decoder, with past_key_values input (KV cache filled), not computing cross attention. Not required at inference (use decoder_model_merged.onnx instead).decoder_model_merged.onnx
: The two previous models fused in one, to avoid duplicating weights. A boolean input use_cache_branch
allows to select the branch to use. In the first forward pass where the KV cache is empty, dummy past key values inputs need to be passed and are ignored with use_cache_branch=False.build_delay_pattern_mask.onnx
: A model taking as input input_ids
, pad_token_id
, max_length
, and building a delayed pattern mask to the input_ids. Implements https://github.com/huggingface/transformers/blob/v4.39.3/src/transformers/models/musicgen/modeling_musicgen.py#L1054.This is usable e.g. in transformers.js, there is no implementation in Optimum for the runtime for now.
@fxmarty Would this work for fintuned models on Musicgen? I used this repo to finetune the meduim model, and the output is a .pt model.
@zeke-john yes, it should work as long as the checkpoint (& model repo) follows Transformers style (e.g. https://huggingface.co/facebook/musicgen-small/tree/main). .bin & .safetensors are supported, not sure about .pt
Are there any supported ways to finetune musicgen besides the way i did it, so it stays a transformers model? Or can you convert a .pt model into a transformers model format?
@zeke-john You should try to use https://github.com/huggingface/transformers/blob/main/src/transformers/models/musicgen/convert_musicgen_transformers.py which should allow you to do the conversion (audiocraft format to transformers format).
@fxmarty after we export several onnx model, how can we run these onnx model locally?
Feature request
Musicgen was recently added to 🤗 Transformers (model doc) and it would be great to be able to export those models to ONNX with Optimum.
Motivation
This will allow me to support music generation models in Transformers.js
Your contribution
I will integrate into transformers.js once available in optimum.