kimborgen / falcon-llm

Apache License 2.0
1 stars 0 forks source link

Missing final MLP #6

Open kimborgen opened 1 year ago

kimborgen commented 1 year ago

Becauase of the pararell attention/MLP, the final output will have an "unprocessed" attention output. Can we increase performance by adding a final MLP layer?