Open Ryanuppp opened 1 month ago
Hi @Ryanuppp ,
We are exploring a follow-up work after Quest, in which we plan to update these features.
@happierpig Is GQA currently supported? The mainstream architectures, such as llama2/3 70b, are mainly GQA. It is very important to support this feature.
Great job! We found that Quest is implemented on the previous version of flashinfer and some common feature are not support currently.