Falcon 7b and 40b differing attention layer

kimborgen / falcon-llm

Apache License 2.0

1 stars 0 forks source link

Falcon 7b and 40b differing attention layer #14

Closed kimborgen closed 1 year ago

kimborgen commented 1 year ago

As revealed in #9 the two falcon models have different attention layers. 40b does not have a multi_query config options, and seems to be multi_query by default. So investigate this further to see if we can combine the two models using the same attention layer.

kimborgen commented 1 year ago

fixed in hf port