vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
23.65k stars 3.39k forks source link

[New Model]: fastspeech2_conformer (just need a new attention mechanism: RelPositionMultiHeadedAttention) #4736

Open cillinzhang opened 2 months ago

cillinzhang commented 2 months ago

The model to consider.

https://huggingface.co/espnet/fastspeech2_conformer

A complete model is not required, only need a new attention mechanism FastSpeech2ConformerAttention (following is the code): https://github.com/huggingface/transformers/blob/47735f5f0f2752500d115d2f6bd57816032599b6/src/transformers/models/fastspeech2_conformer/modeling_fastspeech2_conformer.py#L463

This new attention mechanism also is known as RelPositionMultiHeadedAttention (following is the code): https://github.com/wenet-e2e/wenet/blob/f2372ae6d97f926688fee821e609e42aaf41571d/wenet/transformer/attention.py#L294

The closest model vllm already supports.

llama

What's your difficulty of supporting the model you want?

a new attention mechanism

rajveer43 commented 2 months ago

I would like to work on this issue

cillinzhang commented 2 months ago

I would like to work on this issue

There are two key points to implement while decoding:

rajveer43 commented 2 months ago

Okay, @cillinzhang , Understood

rajveer43 commented 2 months ago

Okay, I will keep thiss in mind