parallel attention/mlp speedup

kimborgen / falcon-llm

Apache License 2.0

1 stars 0 forks source link

parallel attention/mlp speedup #5

Open kimborgen opened 1 year ago

kimborgen commented 1 year ago

The model can compute the attention and MLP in pararell. They mention that they have a custom training pipeline, so do we see this speedup with the HF framework? Pytorch does not do this automatically.