Open kimborgen opened 1 year ago
The model can compute the attention and MLP in pararell. They mention that they have a custom training pipeline, so do we see this speedup with the HF framework? Pytorch does not do this automatically.
The model can compute the attention and MLP in pararell. They mention that they have a custom training pipeline, so do we see this speedup with the HF framework? Pytorch does not do this automatically.