pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
2.29k stars 170 forks source link

[fused_rmsnorm] Avoid querying device inside forward #301

Closed wconstab closed 3 months ago

wconstab commented 5 months ago

Stack from ghstack (oldest at bottom):

Get sm_count another way to work around issues with meta-device tracing

Note: this PR isn't strictly safe as it burns in device 0's sm count

wconstab commented 3 months ago

abandoned