Closed ramzeez88 closed 4 months ago
Have the granite models been trained with grouped query attention?
Hi, the 8b model uses GQA more info here:
closing this
Have the granite models been trained with grouped query attention?