Open cillinzhang opened 2 months ago
https://huggingface.co/espnet/fastspeech2_conformer
A complete model is not required, only need a new attention mechanism FastSpeech2ConformerAttention (following is the code): https://github.com/huggingface/transformers/blob/47735f5f0f2752500d115d2f6bd57816032599b6/src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py#L463
This new attention mechanism also is known as RelPositionMultiHeadedAttention (following is the code): https://github.com/wenet-e2e/wenet/blob/f2372ae6d97f926688fee821e609e42aaf41571d/wenet/transformer/attention.py#L294
llama
a new attention mechanism
I would like to work on this issue
There are two key points to implement while decoding:
Okay, @cillinzhang , Understood
Okay, I will keep thiss in mind
The model to consider.
https://huggingface.co/espnet/fastspeech2_conformer
A complete model is not required, only need a new attention mechanism FastSpeech2ConformerAttention (following is the code): https://github.com/huggingface/transformers/blob/47735f5f0f2752500d115d2f6bd57816032599b6/src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py#L463
This new attention mechanism also is known as RelPositionMultiHeadedAttention (following is the code): https://github.com/wenet-e2e/wenet/blob/f2372ae6d97f926688fee821e609e42aaf41571d/wenet/transformer/attention.py#L294
The closest model vllm already supports.
llama
What's your difficulty of supporting the model you want?
a new attention mechanism