Open xrsrke opened 9 months ago
Implement distributed attention in LightSeq, Colossal-AI, or DeepSpeed's SP.... We have not decided which one yet.
from pipegoose.nn.sequence_parallel.attention import DistributedAttention local_attention = torch.nn.MultiheadAttention attention = DistributedAttention(local_attention, parallel_context) outputs = attention(q, k, v) assert outputs == local_attention(q, k, v)
TODOs
Reading
on it
Implement distributed attention in LightSeq, Colossal-AI, or DeepSpeed's SP.... We have not decided which one yet.
TODOs
Reading